galaxyproject / galaxy

Data intensive science for everyone.
https://galaxyproject.org
Other
1.42k stars 1.01k forks source link

make use of file "magic" detection standard for datatype sniffing #724

Open mr-c opened 9 years ago

mr-c commented 9 years ago

For example: khmer provides filetype detection definitions in the form of a "magic" config for the unix file command: https://github.com/dib-lab/khmer/blob/master/lib/magic

The Galaxy datatype sniffers for the khmer/oxli format are a re-implementation of the logic in this magic file. It would be useful to just supply a mapping from the output of the file command instead of recoding the sniffing.

mcrusoe@localhost:~/khmer/tests/test-data$ file normC20k20.ct 
normC20k20.ct: Binary from the khmer project, file format version 4, k-mer countgraph

Or even just map from a media/mime-type.

mcrusoe@localhost:~/khmer/tests/test-data$ file --mime-type normC20k20.ct 
normC20k20.ct: application/vnd.oxli-countgraph

FYI: I just registered the above media type; should hear back within a week.

mr-c commented 9 years ago

@jmchilton

hexylena commented 9 years ago

I've been wanting to split out datatypes completely, and start from the ground up with magic numbers and a re-architecture of it. I made some progress but not enough to say "let's seriously consider it". :+1: for the python filemagic module to detect magic numbers. Would save MANY lines of code.