Closed bioinfornatics closed 13 years ago
There already is a standard for specifying the format of a genomic text file inside the genomic text file itself. I strongly discourage inventing a new one. That's how things get ugly and out of hand. Let's all remember this xkcd.
The standard currently in practice is to include type=bed
inside the track
header line for a BED file. A WIG file, for instance, will contain type=wiggle_0
etc.
Just like it is specified on this page and already implemented in bbcflib.track
.
As a side note, I don't think distribution-specific functionality is worth spending time developing. Either it works everywhere, either it needs a simple dependency which is included in PyPI, or we don't use it and include a workaround piece of code that does the minimum job.
The problem at hand of recognizing file formats falls under this third category in my opinion as it can't depend on another standard python package nor work everywhere. That's why this small function was written. It does what we need in an inelegant fashion, nor more nor less, but with no dependencies and on all platforms all for 10 minutes of development time.
i understand is your project nobody could do a request. by the way read ucsc spec: http://genome.ucsc.edu/FAQ/FAQformat.html#format5
Now recognition format work at least on ubuntu and fedora with python-magic. know format: bed, wig, gtf, gff, maf, psl, bedgraph So like said UCSC specification it will be nice if when we convert sqlite to , add in top of this file a header like:
<forma\t>
e.g: ##bed