Closed asadatae closed 4 months ago
What is your command line for trimming the UMI?
This is what I use:
umi_tools extract --bc-pattern=NNNNNNNNXXX, --stdin fastq2, --stdout umi_extracted_fastq2
When you extract the UMIs from your read2s you need to add those UMIs to the read 1 header, as well as the read2 header:
umi_tools extract --bc-pattern=NNNNNNNNXXX --stdin fastq2 --stdout umi_extracted_fastq2 --read2in=fastq1 --read2out=umi_extracted_fastq1
Thank you! That did the trick!
Hi,
Thank you so much for all the resources and help provided in this community!
My PE150 bulk RNAseq read2 has an 8nt UMI at each fragment start. I first extract the UMIs from read2 using umi_tools with the following parameter:
--bc-pattern=NNNNNNNNXXX
Then I trim adapters from both read1 and read2 with trim_galore using auto detection and remove orphan reads before STAR alignment. Each intermediate file retains the 8nt UMI in the name of the fragment like so:
@A00738:734:HHVHMDSX7:1:1101:1217:1000_CACACATA 2:N:0:ATATGGATAT+TAATACAGGT ACCGGGCGACTCTTAGCGGTGGATCACTCGGCTCGTGCGTCGATGAAGAACGCAGCTAGCTGCGAGAATTAATGTGAATTGCAGGACACATTGATCATCGACACTTCGAACGCACTTGCGGCCCCGGG + FFFFFFFFFFFF:FFFFF:FFFF:FFFFFFFFFFFFFFFFFF,,F:FFF,FFFFFFFFF:FFFFFFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
But when I inspect the STAR output BAM file, the UMIs are lost:
A00738:734:HHVHMDSX7:1:1128:18285:10394 163 chr1 10001 0 26S109M6S = 10022 227 ACCGGCCCTATCCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCTAACCCAGATAGG FFFFFF:FFFFFFFFFFFFFFFF:FFFFFFFFFFFFFFFFFFFFFFFFFFFFFF:FFFFFFFFFF:F,:FFF:FFFF::FFFFF::FFF,FFFFF,FF:FFF:FF,::FF,FFFFF:F:,:FFFF,:::F:,,FF::,,:F NH:i:8 HI:i:1 AS:i:234 nM:i:1
So when I perform umi_tools dedup, it outputs this error, which I suspect is because the UMIs are not there in the BAM file?: AssertionError: not all umis are the same length(!): 38 - 39
Here are my STAR paremeters:
STAR --runThreadN 8 --genomeDir hg38_star_index --readFilesIn read1 read2 --outFileNamePrefix sample1 --outReadsUnmapped Fastx --outSAMtype BAM SortedByCoordinate --sjdbGTFfile /gencode.v44.chr_patch_hapl_scaff.annotation.gtf
What should I change to retain the UMIs in the BAM alignment file? Any insight would be highly appreciated!
Here's the instructions from the library prep kit I'm using:
Sequencing Analysis Guidelines • Read 1 matches the antisense sequence of the input RNA. • If you are performing paired-end sequencing, Read 2 will correspond to the sense strand. • First eight cycles of Read 2 belong to UMIs followed by 3 nucleotides of UMI-linker and 3 nucleotides derived from the kit • Trim 8 nt UMIs + 3 nt UMI linker + 3 nt from Pico v3 SMART UMI Adapter from Read2 prior to mapping.