Closed AustinHartman closed 4 years ago
So the hardcoded modified UMI was a work-around that I implemented for this tenx
mode based on the following logic:
1) we want to deduplicate reads within a unique UMI / barcode combination
2) In the regular bcall
mode, I split by single cell into it's own bam and run Picard Deduplicate individually, but when we run the deduplication on a bam containing several 10x barcodes, we want to somehow encode but the cell/UMI/library on it's own... Thus, I encode this pseudo barcode-UMI as it's own entity.
3) I use the MU as an indicator later on to assemble this combination... See:
https://github.com/caleblareau/mgatk/blob/master/mgatk/bin/python/chunk_barcoded_bam.py#L59
This isn't particularly elegant (and I think that I should minimally comment it better), but I think that it is achieving what I want it to achieve.
Ah, I see. Thanks for the details. I was confused why the command line arg was named --umi-barcode rather than --umi, but this makes sense now.
Hey! I noticed that when running
mgatk tenx ...
the value of umi_barcode is being hard coded to "MU" after completion of the bam chunking thread pool (which uses the previous umi_barcode value). I believe the only impact of this change is the previous umi_barcode value will be stored in the .parameters.txt output file when runningtenx
mode.