cdgriffith / puremagic

Pure python implementation of identifying files based off their magic numbers
MIT License
158 stars 34 forks source link

Version 2.0 Goals #70

Open cdgriffith opened 3 months ago

cdgriffith commented 3 months ago

Now that puremagic is picking up some outside traction, and used in places like MongoDB, want to lay out clear future plans.

Please keep comments on this page limited to overall goals, any specific conversations about any goal should be their own issue and will be updated here.

NebularNerd commented 3 months ago

Could #69 be a new feature for 2.0? Compatibility wise the new field would/should not break anything (that I'm aware of).

CatKasha commented 3 months ago

Hi, found out your project via "Explore repositories" on github.com homepage feed I have kinda similar project https://github.com/CatKasha/yet-another-filetype-checker Idk if it will be helpful (my project is very simple) but hope it will give you some ideas for improvements

chapmanjacobd commented 3 months ago

I just found this: https://mark0.net/soft-trid-e.html

Not sure how well it is known but it contains "over 17k file types". The file signatures does not have an explicit data license attached to it, but at the very least it might be useful to compare against

maybe related:

NebularNerd commented 3 months ago

TrID is one of the oldest filetype sites/software out there. That site has looked near enough the same for decades.

Their database is pretty solid and very extensive. But they cannot generate a confidence or process more complicated searches. For example .SBK Creative Soundfont is only handled as an extension where as we can handle looking at the file in two places to generate a match.