alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

Reverse reads not mapping #1017

Open priyanka8590 opened 4 years ago

priyanka8590 commented 4 years ago

Hello,

I am aligning 5210 RNA-seq reads to all the predicted ORFs from Arabidopsis thaliana. STAR finishes successfully without any issues. This is how my command looks like: STAR --runThreadN 50 --genomeDir /work/LAS/mash lab/bhandary/analysis_regulon_prediction/open_reading_frame/star_index --outSAMtype BAM SortedByCoordinate --outFilterMultimapNmax 500 --outFilterMismatchNmax 5 # For round 1 no mismatches are allowed --> done to capture the most confident junction --alignIntronMax 1 --limitBAMsortRAM 107374182400 --outSJfilterOverhangMin 12 12 12 12 --outSAMattributes NH HI AS nM NM MD jM jI XS --outReadsUnmapped Fastx --genomeLoad LoadAndKeep --outFileNamePrefix /work/LAS/mash-lab/bhandary/analysis_regulon_prediction/open_reading_frame/"+Run+"STAR --readFilesIn /work/LAS/mash-lab/bhandary/analysis_regulon_prediction/open_reading_frame/"+Run+"_1.fastq "+"/work/LAS/mash-lab/bhandary/analysis_regulon_prediction/open_reading_frame/"+Run+"_2.fastq "

STAR finishes successfully without errors: Aug 29 21:18:41 ..... started STAR run Aug 29 21:18:41 ..... loading genome Aug 29 21:18:41 ..... started mapping Aug 29 21:29:11 ..... finished mapping Aug 29 21:29:11 ..... started sorting BAM Aug 29 21:29:33 ..... finished successfully

However, when I look at the "sortedByCoord.out.bam" file, there's no reverse reads being mapped. When I run salmon on the bam file, it shows this error:

WARNING: Detected suspicious pair --- The names are different: read1 : 680797 read2 : 752763

[2020-08-30 12:52:18.133] [jointLog] [warning]

WARNING: Detected suspicious pair --- The names are different: read1 : 752888 read2 : 1665951

I don't know where things are going wrong. I would appreciate any help you could give me.

Thank you, Priyanka

alexdobin commented 4 years ago

Hi Priyanka,

please check the Log.out file for the correct paths to read1 and read2. I am not sure why you are using + in the command line - they will not work with standard shells.

Cheers Alex

priyanka8590 commented 4 years ago

Hello Alex,

Thank you for your reply. Sorry for pasting in my command with + in the command. I copy-pasted it from the python code that I am using because I'm mapping 5210 reads to the ORFs. I checked the Log.out file and it is indeed showing the correct path to read1 and read2. Where could I be going wrong?

alexdobin commented 4 years ago

Hi Priyanka,

could you post a few lines of the Aligned.sortedByCoordinate.bam file? Let us see if there is anything wrong with them.

Cheers Alex

priyanka8590 commented 4 years ago

Hi Alex,

Thank you for your reply. Here are a few lines for the

2482151 99      1_3     1       255     70S80M  =       14      154     CAGAGAGCGAGAGAGATCGACGGCGAAGCTCTTTACCCGGAAACCATTGAAATCGGACGGTTTAGTGAAAATGGAGGATCAAGTT
2741237 99      1_3     1       255     23S127M =       1       143     TGAAATCGGACGGTTTAGTGAAAATGGAGGATCAAGTTGGGTTTGGGTTCCGTCCGAACGACGAGGAGCTCGTTGGTCACTATCT
2741237 147     1_3     1       255     7S143M  =       1       -143    AGTGAAAATGGAGGATCAAGTTGGGTTTGGGTTCCGTCCGAACGACGAGGAGCTCGTTGGTCACTATCTCCGTAACAAAATCGAA
2742408 99      1_3     1       255     23S127M =       1       143     TGAAATCGGACGGTTTAGTGAAAATGGAGGATCAAGTTGGGTTTGGGTTCCGTCCGAACGACGAGGAGCTCGTTGGTCACTATCT
2742408 147     1_3     1       255     7S143M  =       1       -143    AGTGAAAATGGAGGATCAAGTTGGGTTTGGGTTCCGTCCGAACGACGAGGAGCTCGTTGGTCACTATCTCCGTAACAAAATCGAA
4210868 99      1_3     1       255     23S127M =       73      154     TGAAATCGGACGGTTTAGTGAAAATGGAGGATCAAGTTGGGTTTGGGTTCCGTCCGAACGACGAGGAGCTCGTTGGTCACTATCT
6848169 163     1_3     1       255     70S80M  =       2       151     CAGAGAGCGAGAGAGATCGACGGCGAAGCTCTTTACCCGGAAACCATTGAAATCGGACGGTTTAGTGAAAATGGAGGATCAAGTT
6849145 163     1_3     1       255     70S80M  =       2       151     CAGAGAGCGAGAGAGATCGACGGCGAAGCTCTTTACCCGGAAACCATTGAAATCGGACGGTTTAGTGAAAATGGAGGATCAAGTT
9221183 163     1_3     1       255     87S63M  =       10      154     CGGAGAAATACAGATTACAGAGAGCGAGAGAGATCGACGGCGAAGCTCTTTACCCGGAAACCATTGAAATCGGACGGTTTAGTGA

Thanks again! I hope this can help figure out what's going wrong! Priyanka

alexdobin commented 4 years ago

Hi Priyanka,

these SAM lines look fine -both reads are mapped. I wonder if salmon needs an "unsorted" BAM file rather than sorted - i.e. where the reads always go in pairs. In the sortedByCoordinate BAM the mates often appear separately because they are sorted by each mate coordinate. Please try --outSAMtype BAM Unsorted and feed the Aligned.out.bam to salmon.

Cheers Alex