cdgriffith / puremagic

Pure python implementation of identifying files based off their magic numbers
MIT License
158 stars 34 forks source link

Variant field in magic.json? #69

Open NebularNerd opened 3 months ago

NebularNerd commented 3 months ago

Re-opening as #68 got a bit off-track from the original topic

Looking at #67 regarding imghdr I started looking at the SGI File format as that needs some love much like I did for PCX on #50. I was about to start on a PR to add all the variants but had an idea regarding naming convention.

At present the .json has a single name field, this works well enough but depending on how people use that name there could be a better way.

For example with PCX we now have: https://github.com/cdgriffith/puremagic/blob/88bc58f26339094d8763ac6c4605c1101e3f60a2/puremagic/magic_data.json#L831-L866

This is nice as we can determine every variant this format can offer, but maybe it's a bit too 'wordy'. A possible enhancement could be a 'variant' field in the .json like so:

    ["0a000101", 0, ".pcx", "image/x-pcx", "ZSOFT Paintbrush file", "(2.5, fixed EGA palette, 1bpp)"],
    ["0a020101", 0, ".pcx", "image/x-pcx", "ZSOFT Paintbrush file", "(2.5, modified EGA palette, 1bpp)"],
    ["0a030101", 0, ".pcx", "image/x-pcx", "ZSOFT Paintbrush file", "(2.8, 1bpp)"],
    ["0a040101", 0, ".pcx", "image/x-pcx", "ZSOFT Paintbrush file", "(Paintbrush for Windows, 1bpp)"],
    ["0a050101", 0, ".pcx", "image/x-pcx", "ZSOFT Paintbrush file", "(3.0, 1bpp)"],

etc...

This would give those who only need a basic name a straightforward 'this is what I am', while those who would like to know precisely could use the variant to get specifics.

Going back to the SGI format, we can see in the file specifications that it's got a small header with a lot of variant flags. At present there is one SGI entry in the .json at:

https://github.com/cdgriffith/puremagic/blob/88bc58f26339094d8763ac6c4605c1101e3f60a2/puremagic/magic_data.json#L794

Which equates to a RLE compression, 2bpc, multiple 2D images. I'm happy to run a PR with a similar naming convention as I did for PCX, this would generate a roughly similar length list for SGI variants, but had the idea and wanted to share.

Any thoughts on this?