abdenlab / oxbow

Read specialized NGS formats as data frames in R, Python, and more.
https://lifeinbytes.substack.com/p/breaking-out-of-bioinformatic-data-silos
Apache License 2.0
59 stars 8 forks source link

Refactor fasta reader index usage and allow filelike object use #51

Open GarrettNg opened 11 months ago

GarrettNg commented 11 months ago

The fasta implementation was a little tricky since there are multiple relevant readers upstream in noodles: fasta::Reader, fasta::IndexedReader, and fasta::fai::Reader. The core records_to_ipc function relies on the query method supplied by the fasta::Reader, so the fasta::Reader was turned into a FastaReader struct field. Since the query method also relies on the fasta index, index was made into a FastaReader struct field and is now read through a separate fasta::fai::Reader instead of using the fasta::IndexedReader, which has access to the index, but ~lacks the query method~. (EDIT: yes it does have a query method)

Filelike object compatability was also implemented in line with the other readers, and generics have been sprinkled about accordingly.

Also now provides a proper error message when a .fai index file can't be found instead of providing a generic "file not found" error which a user may attribute to the fasta file and not the index file.

GarrettNg commented 11 months ago

I'm going to revisit to make it use the IndexedReader when possible. Also, the IndexedReader does have a query method.