Closed Koeng101 closed 5 months ago
Hmm, I think static allocation of bytes might be interesting here.
- (16 byte) read ID (UUIDs can be used directly or a hash of the identifier can be used)
- (8 byte) uint64: start position
- (4 byte) uint32: length
This would allow you to statically allocate the whole index into memory - you can derive the exact number of reads from the byte length of the file, and you can statically allocate a whole bunch of things
I want a binary fastqindex similar to https://hasindu2008.github.io/slow5specs/slow5-v1.0.0.pdf
This would mainly be used when writing a large fastq file to a data store, like S3, while still wanting to seek out specific lines from that fastq file. There would be two modifications: standardization of size,
30 bytes in total for a typical run. If a promethion flow cell returns 10,000,000 reads, the index file will be approx 286mb.