bioinfologics / w2rap-contigger

An Illumina PE genome contig assembler, can handle large (17Gbp) complex (hexaploid) genomes.
http://bioinfologics.github.io/the-w2rap-contigger/
MIT License
44 stars 14 forks source link

Speeding up loading reads into memory #32

Closed sanjitsbatra closed 6 years ago

sanjitsbatra commented 6 years ago

Hey! I have a dataset with about 1T of reads in fastq format. This takes about a week to load into memory in the first step. Is there anyway to quicken this process?

bjclavijo commented 6 years ago

We are working on a faster step1 version, but 1T of reads is probably going to kill other parts of the software anyway. Can you comment on genome and coverage? It is either a super challenging project that I would love to hear about or you can probably just downsample a lot...

sanjitsbatra commented 6 years ago

Is there any way to parallelize this process? It would seem that one could seek blocks in parallel and load in memory, right?