abdenlab / oxbow

Read specialized NGS formats as data frames in R, Python, and more.
https://lifeinbytes.substack.com/p/breaking-out-of-bioinformatic-data-silos
Apache License 2.0
59 stars 8 forks source link

Add bed12 reader #57

Open GarrettNg opened 11 months ago

GarrettNg commented 11 months ago

This PR handles spec compliant BED12 files, so it won't work as is with BED3-9 and optional fields. Handling the other types of bed files could require creating separate reader structs for each one that build off the previous and impl different record column sizes as noodles does with BedN and Record. There may be a simpler solution. In the interim, it is still easier to read bed files with a csv reader (delimited by \t), especially given the variety of column arrangements.

Of note are 2 types of arrow builders used in the BedBatchBuilder that have not been used elsewhere in oxbow:

These builders could potentially be used for the nested fields of other readers in oxbow.