caleblareau / mgatk

mgatk: mitochondrial genome analysis toolkit
http://caleblareau.github.io/mgatk
MIT License
98 stars 25 forks source link

Clarification on marking duplicates #34

Closed cnk113 closed 3 years ago

cnk113 commented 3 years ago

Hello,

I was wondering if the duplicate marking picard was at cell resolution? I didn't realize until yesterday with the release of cellranger-atac 2.0 that CR duplicate marking was at bulk resolution. I'm assuming mgatk it is at cell resolution?

Thanks, Chang

caleblareau commented 3 years ago

Hi Chang,

Yes we are operating at the cell resolution— I didn’t agree with how CR was previously operating, and I’m glad that the new version addresses this.

Caleb

On May 5, 2021, at 4:04 PM, Chang Kim @.**@.>> wrote:

Hello,

I was wondering if the duplicate marking picard was at cell resolution? I didn't realize until yesterday with the release of cellranger-atac 2.0 that CR duplicate marking was at bulk resolution. I'm assuming mgatk it is at cell resolution?

Thanks, Chang

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHubhttps://github.com/caleblareau/mgatk/issues/34, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD32FYNERCDEK3DU72VWBUTTMHFHLANCNFSM44F5BFDQ.

cnk113 commented 3 years ago

Ah that's great, I didn't want to rerun it. I guess in the future you might not need to run mark duplicates within mgatk or maybe add a parameter to use the existing marked duplicates?

caleblareau commented 3 years ago

Well, the CellRanger v2 update only applies to how they process/produce the fragments file but the bam is the same

On May 5, 2021, at 4:30 PM, Chang Kim @.**@.>> wrote:

Ah that's great, I didn't want to rerun it. I guess in the future you might not need to run mark duplicates within mgatk or maybe add a parameter to use the existing marked duplicates?

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/caleblareau/mgatk/issues/34#issuecomment-833113538, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD32FYIYT6UVQ2I7R25XUMTTMHIKPANCNFSM44F5BFDQ.

cnk113 commented 3 years ago

Hmm, the 10X bioinformatician was saying the read pairs are marked duplicates at CB resolution. I'll follow up with them for further clarification, thanks!

caleblareau commented 3 years ago

Ah, well, mgatk ignore thes the mark duplicate tags in the bam existing and computes it de novo.

On May 5, 2021, at 5:30 PM, Chang Kim @.**@.>> wrote:

Hmm, the 10X bioinformatician was saying the read pairs are marked duplicates. I'll follow up with them for further clarification, thanks!

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/caleblareau/mgatk/issues/34#issuecomment-833139953, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AD32FYIQJ6TVRJ5M6UDSTLTTMHPLNANCNFSM44F5BFDQ.

cnk113 commented 3 years ago

One more thing, I have a custom library that has overlapping reads with the same UMI, would UMI deduplication setting work here? Ideally the overlapping regions would be picked for the highest quality.

caleblareau commented 3 years ago

When you specify a UMI tag in mgatk, the software deduplicates reads using PIcard Tools MarkDuplicates, which retains the read with the highest mean base quality.