aarhusstadsarkiv / reference-files

Package of json-files used across multiple repositories and services
0 stars 0 forks source link

Add custom_signatures that include rules about extensions #39

Open clausjuhl opened 2 months ago

clausjuhl commented 2 months ago

We should implement functionality that enables custom rules that take extensions into account when making decisions about the correct file format.

An example could be .emz files that are correctly identified by Pronom as gzipped files, but they only use the gzip format as a compression container and it makes no sense to extract .emz files with unarchiver.

Another example is .dat files which are related to Mapinfo projects. They are also correctly identified as dBase files, but when related to a GIS project, they only make sense in relation to the other project files. Fortunately, it is possible to reidentify those .dat files using their byteheader, but that is not the case with .emz files.