HudsonAlpha / fmlrc2

Apache License 2.0
43 stars 5 forks source link

[Question] Split => correct => re-join #24

Closed tuannguyen8390 closed 2 years ago

tuannguyen8390 commented 2 years ago

Hi there,

I was wondering if it is feasible to split the long read file into smaller chunks. Then in parallel load each chunk with the index. Finally merge them back later to speed up flmrc2 ?

Many thanks,

Tuan

holtjma commented 2 years ago

Yes, we've actually done this before when we had a very, very large number of long reads (i.e. even multi-processing on a single machine was too slow). You can do this because each long read is corrected independently from all other long reads. You can follow this general process:

1) Create BWT from short reads (no way to further parallelize this step) 2) Split long read FASTX file into multiple smaller FASTX files 3) Run correction on each smaller FASTX file 4) Merge the small FASTX files back into a single result

tuannguyen8390 commented 2 years ago

Thanks ! It's good to have some confirmation that this would work in real scenario.