Closed SPPearce closed 4 years ago
One solution would be to sort the reads by name at the end of the process and transfer the RX tag from the read1 to the read2. The problem comes when you have two read1s pointing to a single read2, and you might have read2s that are not pointed to by any read1. Perhaps a mode or a tool that did this, but required non-multimapped, primary alignments only?
Yes, although I couldn't find a tool that actually manipulated tags directly, short of manually doing it read by read in pysam etc.
I have however found a solution (for my purposes at least). bwa mem
has an option -C
to take any "comments" from the fastq header and assign them to the reads in the aligned sam file. So I'm now using sed
to move the UMI from being the end of the readname to a "comment" after (e.g. zcat ${R1FASTQ} | sed "s/_\([ACGTN]*\)/ RX:Z:\\1/g"
to make it a valid sam tag as expected for the sam file. This appears to be working for me at the moment.
Yes, I think this would have to be a done read by read by pysam (any tool you used would only be doing the same). Of course your solution works find to add the uncorrected UMI as a tag, but not a corrected one.
Under normal circumstances I'd be happy to knock something together for this, but I'm curerntly completely snowed under with teaching.
Sure, that is perfectly understandable. The fgbio toolkit has options to do the correction of the UMIs, so I'll use that for now.
@SPPearce - Can we close this issue?
Hi Tom,
Yes, you can close this. Thanks, Simon
Hi UMI-tools team,
I'm trying to call molecular consensus reads from bam files where I have the UMI on the read name, using the fgbio toolkit. This tool expects the reads to be given in the RX: tag of the bam files, which I am able to do using
umi_tools group --umi-group-tag "RX"
. However, this only puts the RX: tag on the R1 of each bam file, not in the R2, andfgbio GroupReadsByUmi
still fails.Is there a way to add the tag to both reads, rather than just the R1?
Thanks, Simon