Open jmthibault79 opened 10 years ago
Also make sure there are tests for all of the valid combinations.
bcf_open
and bcf_read1
to read a single file line by line.bcf_open
and bcf_itr_next
to read a single indexed file over a set of intervals.bcf_sr_add_reader
and bcf_sr_next_line
to read multiple indexed files over a set of intervals.VariantReader | IndexedVR | MultipleVR | SyncedVR | |
---|---|---|---|---|
VCF | YES | no | YES | no |
VCF GZ | YES | no | YES | YES |
BCF | YES | YES | YES | YES |
Single | YES | YES | YES | YES |
Multiple | no | no | YES | YES |
Index | no | YES | no | YES |
Interval | no | YES | no | YES |
Requires Index | no | YES | no | YES |
Requires Interval | no | no | no | no (#236) |
Missing functionality:
Unknown/untested:
SyncedVariantReader works with no intervals after #236
Added ticket #237 for the single indexed BCF question. Added ticket #239 for intervals with VCF files Added ticket #240 for Intervals with unindexed files (related because htslib doesn't handle VCF indices)
I would ignore BCF GZ, that's an abomination because BCF's can be intrinsically gzipped....
There are at least three ways to read variant files in htslib: indexed, unindexed, and synced. Each has advantages and drawbacks. Enable the use of all of these via Variant Reader/Iterators and make a record of when it's appropriate to use each.
Where multiple options are available, run benchmarks to determine which is best.