alexdobin / STAR

RNA-seq aligner
MIT License
1.85k stars 506 forks source link

Some questions about gencode.v43.transcripts.fa and pacbio data #1819

Open 5Tony opened 1 year ago

5Tony commented 1 year ago

Hi Alexander, I want to map gencode.v43.transcripts.fa to GRCh38.primary_assembly.genome.fa, but I failed with STAR and STARlong, and the generated Aligned.out.bam is particularly small, I don't understand why this happens I don't understand why there is such a problem, so I'm here to ask you for advice. Thank you very much for your help.

STAR version: 2.7.10b

Here is the code I am using:

index

STARlong --runThreadN 20 \
--runMode genomeGenerate \
--genomeDir ~/index/STAR_1000 \
--sjdbOverhang 999 \
--genomeFastaFiles ~/reference/GRCh38.primary_assembly.genome.fa \
--sjdbGTFfile ~/reference/gencode.v43.annotation.gtf

mapping

STARlong --runThreadN 20 \
--runMode alignReads \
--readNameSeparator space \
--outFilterMultimapScoreRange 1 \
--outFilterMismatchNmax 2000 \
--scoreGapNoncan -20 \
--scoreGapGCAG -4 \
--scoreGapATAC -8 \
--scoreDelOpen -1 \
--scoreDelBase -1 \
--scoreInsOpen -1 \
--scoreInsBase -1 \
--alignEndsType Local \
--seedSearchStartLmax 10 \
--winAnchorMultimapNmax 1000 \
--seedMultimapNmax 100000 \
--seedPerReadNmax 100000 \
--seedPerWindowNmax 1000 \
--alignTranscriptsPerReadNmax 10000 \ --alignTranscriptsPerWindowNmax 10000 \
--outSAMtype BAM Unsorted \
--outFileNamePrefix /home/data/t050326/result/STAR/t1/STAR --genomeDir /home/data/t050326/index/STAR_100 \
--readFilesIn /home/data/t050326/reference/gencode.v43.transcripts.fa

STARLog.zip

image

I replaced gencode with chm13_2_merged_flnc.fastq, and the result is similar to gencode.v43.transcripts.fa, and this problem is distressing me.

Looking forward to your reply!

alexdobin commented 1 year ago

Hi @5Tony

there are warning messages in the Log.out, which say that --alignTranscriptsPerReadNmax has to be increased. I would try to increase it to 20000 or 50000 .