digipres / registries-of-practice-project

The "Registries of Good Practice" Project
MIT License
6 stars 0 forks source link

Generating shareable profiles of potential format signatures? #24

Open anjackson opened 2 months ago

anjackson commented 2 months ago

It is possible to borrow some ideas from e.g. TRiD etc and make something that scans collections of file and generates potential format identification signatures? And would that be a useful thing to do to accelerate signature development?

In particular, if the same binary signature is found in a significant number of files, correlated with a particular file extension, is it okay to share that information more widely? Because any string that is unique to a file extension, but common to files with that extension, is extremely unlikely to infringe any copyrights?