alexdobin / STAR

RNA-seq aligner
MIT License
1.85k stars 505 forks source link

TranscriptomeSAM : more read mappped in genome #861

Open qingping12 opened 4 years ago

qingping12 commented 4 years ago

I am having an issue with the output from using --quantMode TranscriptomeSAM missing mapped reads relative to the same reads mapped to the corresponding genome. And there is a issue(#315 launch by ghost),but the reason may been different 。

samtools view Aligned.toTranscriptome.out.bam |awk '{print $1}' |sort|uniq |wc -l 1372915 samtools view genome.bam |awk '{print $1}' |sort|uniq |wc -l 3163732 and I check the gtf file which have 208937 transcripts is fine . then I use the read only mapped in genome to blastn with transcripts.fa and the reads were mapped well 。So the questions still is the transcripts?

my star code: STAR --runThreadN 10 --genomeDir {params.genomedir} --outFilterMismatchNoverLmax 0.05 --outSAMunmapped Within --quantMode TranscriptomeSAM\ --outFilterMultimapNmax 20 --limitBAMsortRAM 16000000000 --readFilesCommand 'gunzip -c' --outSAMtype BAM SortedByCoordinate\ --outReadsUnmapped None --chimSegmentMin 12 --chimJunctionOverhangMin 12 --chimOutJunctionFormat 1 --alignSJDBoverhangMin 10 --alignMatesGapMax 100000 --alignIntronMax 100000\ --alignSJstitchMismatchNmax 5 -1 5 5 --outSAMstrandField intronMotif --chimMultimapScoreRange 10 --chimMultimapNmax 10\ --chimNonchimScoreDropMin 10 --peOverlapNbasesMin 12 --peOverlapMMp 0.1 --genomeLoad NoSharedMemory --twopassMode Basic\ --outFileNamePrefix {params.prefix} --sjdbGTFfile {params.gtf} --readFilesIn {input[0]} {input[1]}

alexdobin commented 4 years ago

Hi @qingping12

the Aligned.toTranscriptome.out.bam only contains those reads whose genomic alignments can be converted to transcriptomic, i.e. they agree fully with the transcript exonic (spliced) structure. My guess is that the reads that you have a number of reads that map into intronic and (less likely) intergenic regions.

Cheers Alex