aseyboldt / fastq-rs

MIT License
47 stars 12 forks source link

Encounter `error: "Fastq record is too long"` when parsing Nanopore sequence data #13

Open sagrudd opened 1 year ago

sagrudd commented 1 year ago

What are the limits of the maximum sequence length within a record? I am imagining a workflow that should be regularly accommodating of reads over 100kb in length (and with recent ultra-long updates should occasionally expect multi Mb sequence reads.

What would be the most sustainable approach to working through this hurdle? Updating the buffer usize (and forking the project), reverting to bio::io::fastq? As a new to rust developer I'd welcome any comments as to e.g. how performance is going to suffer.

Would welcome some thoughts here - thanks!

iskandr commented 1 year ago

I have the same problem with PacBio reads.

Here's the culprit line that sets the buffer size to 68k: https://docs.rs/fastq/latest/src/fastq/lib.rs.html#133