blachlylab / dhtslib

D bindings and OOP wrappers for htslib
MIT License
7 stars 1 forks source link

Other file formats not directly supported by htslib #68

Closed charlesgregory closed 3 years ago

charlesgregory commented 3 years ago

I think we should support bed, gtf, gff3 and other bioinformatics formats (we already support fastq). We are already past the point of being bindings for htslib. This also aligns well with #55. Conversion from bed coordinates to sam or vcf coordinates should be supported directly. Potentially also supporting PAF?

jblachly commented 3 years ago

Agree in principle

I have a repo GFF3D (with a sweet 3D logo 🚀 ) -- this is a good candiate for inclusion, although we need a GFF3 writer as well.

What do you envision for , say, a BED reader? it is pretty simple.

charlesgregory commented 3 years ago

Pretty simple. We should conform with bed format, at least the 12 columns that UCSC and Ensemble agree on. Or we could build as we go kinda similar to the way gff3d does. Get fields if they exist. Other than that use a BGZFile for reading (gets us bgzf, gzip, and network support) and parse each line into a set of our coordinate system. Other fields are optional. Bedgraph could be another format to support. Though we could go crazy trying to support all the formats.