The filtering step uses too much memory

NBISweden / IgDiscover-legacy

Analyze antibody repertoires and discover new V genes from high-throughput sequencing reads

https://www.igdiscover.se

MIT License

17 stars 10 forks source link

The filtering step uses too much memory #72

Closed marcelm closed 7 years ago

marcelm commented 7 years ago

The filtering subcommand (igdiscover filter) reads in the entire input table with IgBLAST assignments before filtering it. This has never been a problem for our usual datasets of around 1 million reads, but on larger datasets, it is too wasteful and can lead to crashes because the process runs out of memory.

marcelm commented 7 years ago

The current memory usage seems to be around 5000 bytes per sequence. For 25 million reads, this would be 125 GB.