Closed realzhang closed 2 years ago
Hi @realzhang. Thanks for your report.
The behaviour above is as expected. There is a check that the read names match, where the read name is taken to be the read identifier up to the first whitespace. So, in the case above, that's @A00541:182:HLK5CDSX2:3:1101:1823:1016
, which is the same in both reads.
With some illumina sequencing, the reads are given a suffix before the first whitespace of /1
or /2
to denote read 1/read2. This causes the read names to not match and UMI-tools to error. In these cases, the switch --ignore-read-pair-suffixes
can be used to tell UMI-tools to ignore the suffix when matching the read names. As you note, internally this involves switching the function used to extract the read names. If you use --ignore-read-pair-suffixes
when your reads don't have the /1
//2
suffixes, UMI-tools will error and state that the suffix is incorrect.
I get it. Your reply is very clear. Thanks a lot.
Great. If you think the documentation is unclear, let me know and I'll take another look at it.
I'm going to close this for now. @realzhang If you've got recommendations for documentation, please feel free to reopen.
Dear developers:
I use '--ignore-read-pair-suffixes' in the command line of umi_tools, but it seems set 'has_suffix" True and run "getReadIDSuffix" instead of "getReadIDNoSuffix" in the file of "umi_methods.py". When I use '--ignore-read-pair-suffixes', the umi_tools errs with "ValueError: read suffix must be /1 or /2. Observed: 16". The header of my fastq file is like '@A00541:182:HLK5CDSX2:3:1101:1823:1016 1:N:0:GAGATTCC+TAATCTTA' for R1 and '@A00541:182:HLK5CDSX2:3:1101:1823:1016 2:N:0:GAGATTCC+TAATCTTA' for R2.
So now I remove '--ignore-read-pair-suffixes' from the command line, and umi_tools runs smoothly.
This seems strange, and I report it here.
Thanks for your great tools.