Open jonathanBieler opened 1 year ago
There was a lot of discussion of a similar nature over at FASTX.jl (see eg https://github.com/BioJulia/FASTX.jl/issues/76), and I think @jakobnissen has started putting in some work on that in BioGenerics.jl.
In short, you are completely correct :wink:
I'm for FileIO integration, but think it should be done in a new BEDFiles.jl
package.
As a result of @jakobnissen's work, it's possible to load all records with the following.
records = open(collect, BED.Reader, "data.bed")
This approach also closes the reader.
And for completeness, below is a longhand variant using the do syntax.
records = open(BED.Reader, "data.bed") do reader
return collect(reader)
end
In 99% of my use cases l just want to read the whole bed file and get a vector of records. Doing so requires quite a bit of boilerplate :
Boilerplate that every user will have to write (possibly several times). In comparison in Python you can do
pr.read_bed(path)
. This seems like an important usability issue.The solution would either to add a internal
BED.load("file.bed")
or to integrate FileO interface. I don't have a strong preference but l would also do the same for other "small" (that typically fit in memory) file format like VCF so it would be better to be consistent about it. To note FileIO also has a streaming interface for large files, so it could also be used for bams and fastqs.