Deduplicating bam files without UMIs

CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets

MIT License

491 stars 190 forks source link

Deduplicating bam files without UMIs #475

Closed SPPearce closed 3 years ago

SPPearce commented 3 years ago

This might be a strange request, but is there a way to run umi_tools dedup on bam files that don't have UMIs in the headers? I have a few older sequencing runs that don't have UMIs, and I'd like to run them through my same pipeline that I've been using with my data with UMI data that I process with umi_tools extract and umi_tools dedup. I could hack the headers to give them all identical fake UMI, but just wondered if there was any other option that I have missed, like a simple flag combination. Or do I have to break out picard for this.

TomSmithCGAT commented 3 years ago

Hi Simon. No in-built option unfortunately. As you suggest, your options are to use picard or add a psuedo UMI.

IanSudbery commented 3 years ago

There IS an --ignore-umi option, that was in there from the very earliest days for debugging and benchmarking. Don't know if that would work.

On Tue, 25 May 2021 at 21:18, Tom Smith @.***> wrote:

Hi Simon. No option unfortunately. As you suggest, your options are to use picard or add a psuedo UMI.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/CGATOxford/UMI-tools/issues/475#issuecomment-848231164, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABJELDV4O3YZ3DQVQPMX5RLTPQAXZANCNFSM45QCZMDQ .

SPPearce commented 3 years ago

Perfect, thanks Ian, I'd missed that was an option available. That lets me process these without having to swap to a different tool.

IanSudbery commented 3 years ago

I'm not guaranteeing that this will work, because it might try to extract the UMI and then throw it away.

SPPearce commented 3 years ago

It seems to work, it no longer complains that the read headers are missing the UMIs and gives me a deduplicated output.

IanSudbery commented 3 years ago

Great! Glad it worked.

I'm closing this for now. Feel free to reopen if necessary.