AfshinLab / BLR

MIT License
5 stars 0 forks source link

Synchronise barcode filtration across chunks. #23

Closed pontushojer closed 4 years ago

pontushojer commented 4 years ago

Fix for issue https://github.com/FrickTobias/BLR/issues/215.

buildmolecules.py no longer tags reads with MN tag and instead the final.molecule_stats.tsv is used for filtration. This introduces a slowdown in the processing as all chunks need to finish the buildmolecules step to continue (see DAG below). The TSV is used to generate a list (TXT file) of barcode that have too many molecules. This list is then used for all chunks to filter out barcodes in the filterclusters step.

Changes included:

Comparison of changes to workflow based on testdata run Old  New
image image
pontushojer commented 4 years ago

I have started a test run to see what kind of slowdown this introduces.

pontushojer commented 4 years ago

This is the results of the testrun. Run was on 20 cores and took about 20 hours. I run on the same dataset as @marcelm in https://github.com/NBISweden/BLR/pull/16, but I noticed that I had run using reference variants instead of calling. So the time for calling variants should be added to this but it is relatively quick. We have also noticed that phasing called variants takes much longer than using the reference ones, I will do a quick test with calling variants just to get a more accurate comparison.

image

So far it seams good though.

pontushojer commented 4 years ago

I rerun the final steps following the mapping step with the latest commit and the result is shown below.

image

The second spike to the full 20 cores is right at the filterclusters step which demonstrates the bottleneck created herein. The runtime in not to affected however.

pontushojer commented 4 years ago

How does it look now @FrickTobias? I moved the rules as you suggested and commented on the other things.

FrickTobias commented 4 years ago

How does it look now @FrickTobias? I moved the rules as you suggested and commented on the other things.

One last thing to resolve.

pontushojer commented 4 years ago

The last thing has been solve so I will go ahead an merge this as soon as the tests are done.