Would it be much work to allow reads to be streamed in from a connection to a FASTQ file? Something like:
fhandle <- open_some_FASTQ_file("my.fastq")
first.chunk <- readDNAStringSet(fhandle, nrec=1000) # first 1000
second.chunk <- readDNAStringSet(fhandle, nrec=1000) # next 1000
# etc.
close(fhandle)
This would allow us to process reads in blocks, which would be more memory-friendly than having to read the entire FASTQ file into memory for simultaneous processing. To achieve this right now, I would need to use skip, which presumably is less efficient as it needs to re-run through the skipped records in the file.
Would it be much work to allow reads to be streamed in from a connection to a FASTQ file? Something like:
This would allow us to process reads in blocks, which would be more memory-friendly than having to read the entire FASTQ file into memory for simultaneous processing. To achieve this right now, I would need to use
skip
, which presumably is less efficient as it needs to re-run through the skipped records in the file.