CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
491 stars 190 forks source link

About customized tag in sam file #554

Closed baozijoe closed 2 years ago

baozijoe commented 2 years ago

Hi,

I'm recently using UMItools dedup function to remove UMI for my sam format data. However, it seems that the tag column will be replaced or erased. Is there a way to retain my annotated tag?

For example:

A00354:813:H5CJNDSX5:4:1101:14208:9064_ACAACTACCC 0 12S 887 30 10S21M10S * 0 0 AATGAAGAGGTAGTTTGTAGGAAGAATTTTTTTTTTTTTGT :FF:FFFF::,FFF,F,,FFFFFFFF:::FF:FFFF,FF,F NM:i:16 EM:Z:-1;0;-1;0;-2;-1;-1;-3;1;-2;3;1;0;0;

In this read, I annotated EM tag, which refers to the number of T inserted/deleted. After UMItools dedup, both NM and EM tag will be replaced but I want to keep them.

Best, Fan

IanSudbery commented 2 years ago

UMI-Tools shouldn't remove any tags from your reads. Are you sure that it is not just selecting reads without these tags as representative of a location?

Also, what command are you using for the dedup?

baozijoe commented 2 years ago

Sorry, I've figured out what's wrong. The separator I used to connect tag column is blank space which will be erased by samtools. After changing to tab separator, the pipeline works well for me!

From: Ian Sudbery Date: 2022-08-17 18:02 To: CGATOxford/UMI-tools CC: baozijoe; Author Subject: Re: [CGATOxford/UMI-tools] About customized tag in sam file (Issue #554) UMI-Tools shouldn't remove any tags from your reads. Are you sure that it is not just selecting reads without these tags as representative of a location? Also, what command are you using for the dedup? — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you authored the thread.Message ID: @.***>