caleblareau / mgatk

mgatk: mitochondrial genome analysis toolkit
http://caleblareau.github.io/mgatk
MIT License
101 stars 27 forks source link

Fix umi_barcode value in <name>.parameters.txt output file #28

Closed AustinHartman closed 4 years ago

AustinHartman commented 4 years ago

Hey! I noticed that when running mgatk tenx ... the value of umi_barcode is being hard coded to "MU" after completion of the bam chunking thread pool (which uses the previous umi_barcode value). I believe the only impact of this change is the previous umi_barcode value will be stored in the .parameters.txt output file when running tenx mode.

caleblareau commented 4 years ago

So the hardcoded modified UMI was a work-around that I implemented for this tenx mode based on the following logic:

1) we want to deduplicate reads within a unique UMI / barcode combination 2) In the regular bcall mode, I split by single cell into it's own bam and run Picard Deduplicate individually, but when we run the deduplication on a bam containing several 10x barcodes, we want to somehow encode but the cell/UMI/library on it's own... Thus, I encode this pseudo barcode-UMI as it's own entity. 3) I use the MU as an indicator later on to assemble this combination... See:

https://github.com/caleblareau/mgatk/blob/master/mgatk/bin/python/chunk_barcoded_bam.py#L59

This isn't particularly elegant (and I think that I should minimally comment it better), but I think that it is achieving what I want it to achieve.

AustinHartman commented 4 years ago

Ah, I see. Thanks for the details. I was confused why the command line arg was named --umi-barcode rather than --umi, but this makes sense now.