CGATOxford / UMI-tools

Tools for handling Unique Molecular Identifiers in NGS data sets
MIT License
493 stars 190 forks source link

Will --ignore-read-pair-suffixes turn on or turn off the "ignore"? #497

Closed realzhang closed 2 years ago

realzhang commented 2 years ago

Dear developers:

I use '--ignore-read-pair-suffixes' in the command line of umi_tools, but it seems set 'has_suffix" True and run "getReadIDSuffix" instead of "getReadIDNoSuffix" in the file of "umi_methods.py". When I use '--ignore-read-pair-suffixes', the umi_tools errs with "ValueError: read suffix must be /1 or /2. Observed: 16". The header of my fastq file is like '@A00541:182:HLK5CDSX2:3:1101:1823:1016 1:N:0:GAGATTCC+TAATCTTA' for R1 and '@A00541:182:HLK5CDSX2:3:1101:1823:1016 2:N:0:GAGATTCC+TAATCTTA' for R2.

So now I remove '--ignore-read-pair-suffixes' from the command line, and umi_tools runs smoothly.

This seems strange, and I report it here.

Thanks for your great tools.

TomSmithCGAT commented 2 years ago

Hi @realzhang. Thanks for your report.

The behaviour above is as expected. There is a check that the read names match, where the read name is taken to be the read identifier up to the first whitespace. So, in the case above, that's @A00541:182:HLK5CDSX2:3:1101:1823:1016, which is the same in both reads.

With some illumina sequencing, the reads are given a suffix before the first whitespace of /1 or /2 to denote read 1/read2. This causes the read names to not match and UMI-tools to error. In these cases, the switch --ignore-read-pair-suffixes can be used to tell UMI-tools to ignore the suffix when matching the read names. As you note, internally this involves switching the function used to extract the read names. If you use --ignore-read-pair-suffixes when your reads don't have the /1//2 suffixes, UMI-tools will error and state that the suffix is incorrect.

realzhang commented 2 years ago

I get it. Your reply is very clear. Thanks a lot.

TomSmithCGAT commented 2 years ago

Great. If you think the documentation is unclear, let me know and I'll take another look at it.

IanSudbery commented 2 years ago

I'm going to close this for now. @realzhang If you've got recommendations for documentation, please feel free to reopen.