AfshinLab / BLR

MIT License
4 stars 0 forks source link

Reduce starcode memory use by pre-counting barcodes #56

Closed HSiga closed 3 years ago

HSiga commented 3 years ago

To address the issue with high ram requirements by large barcodes files.

pontushojer commented 3 years ago

Relates to https://github.com/FrickTobias/BLR/issues/242

Some more information would be good to include here

HSiga commented 3 years ago

Relates to FrickTobias/BLR#242

Some more information would be good to include here

* Runtime Memory use prior to change and after.

* Compare to running starcode with `-d 0` which should do the same thing.

Tested with 3.8G barcode file (fasta.gz)

comp_starcodes

Another test to count the barcodes using -d 0 (note: the ratio parameter was kept as 5 in the test -r 5). old-d0

pontushojer commented 3 years ago

I found some additional things that were, some that I had introduced. After fixing these tests were passed locally.

HSiga commented 3 years ago

A new test was run with a large dataset (15G fastq.gz barcodes file) on Bianca.

The plot for the ram/CPU usage for each of the runs is provided below: cluster_dbs_new_run

pontushojer commented 3 years ago

Looks great! I will merge