alexdobin / STAR

RNA-seq aligner
MIT License
1.87k stars 506 forks source link

Only intergenic reads mapped from STAR aligner #2015

Open annepureddy opened 11 months ago

annepureddy commented 11 months ago

Hello,

I am using the STAR aligner to align bulk RNA-seq samples generated from Smart-Seq-2 protocol using the Nextera XT Dna library prep kit.

This is how I am generating the genome index and aligning my files: Genome Index:

STAR --runThreadN 40 \
   --runMode genomeGenerate \
   --genomeDir /dartfs-hpc/rc/home/w/f006f9w/Genome_index \
   --genomeFastaFiles /dartfs-hpc/rc/home/w/f006f9w/Homo_sapiens.GRCh38.dna.primary_assembly.fa \
   --sjdbGTFfile /dartfs-hpc/rc/home/w/f006f9w/Homo_sapiens.GRCh38.110.gtf \
   --genomeSAindexNbases 11

Alignement

 ls /dartfs-hpc/rc/home/w/f006f9w/Raw_data/*R1_001.fastq.gz | while read x; do
  # save the file name
  base_name=$(basename $x | sed 's/_R1_001\.fastq\.gz//')
echo ${base_name}
  # run STAR for each sample
  STAR --genomeDir /dartfs-hpc/rc/home/w/f006f9w/Genome_index \
    --readFilesIn /dartfs-hpc/rc/home/w/f006f9w/Raw_data/${base_name}_R1_001.fastq.gz /dartfs-hpc/rc/home/w/f006f9w/Raw_data/${base_name}_R2_001.fastq.gz \
    --readFilesCommand zcat \
    --sjdbGTFfile /dartfs-hpc/rc/home/w/f006f9w/Homo_sapiens.GRCh38.110.gtf \
    --runThreadN 16 \
    --outSAMtype BAM SortedByCoordinate \
    --outFilterType BySJout \
    --outFileNamePrefix ${base_name}.
done

I have no errors that come up and am able to generate all the output files.

Then I try to use the picard Collect RNA-seq metrics tool using:

  picard CollectRnaSeqMetrics \
    I= ${sample}.Aligned.sortedByCoord.out.bam \
    O=${sample}.output.RNA_Metrics \
    REF_FLAT="/Users/f006f9w/Dropbox (Dartmouth College)/Goods Lab/Projects/MFGM_NIH_Rimi/Bulk RNA-Seq/Bam_files/refFlat.txt" \
    STRAND=SECOND_READ_TRANSCRIPTION_STRAND \
    RIBOSOMAL_INTERVALS=GRCh38.p5.rRNA.interval_list

However, my .RNA_metrics file does not have any coding, UTR or intronic bases. It would be great if you could please let me know what the issue is here.

## METRICS CLASS    picard.analysis.RnaSeqMetrics
PF_BASES    PF_ALIGNED_BASES    RIBOSOMAL_BASES CODING_BASES    UTR_BASES   INTRONIC_BASES  INTERGENIC_BASES    IGNORED_READS   CORRECT_STRAND_READS    INCORRECT_STRAND_READS  NUM_R1_TRANSCRIPT_STRAND_READS  NUM_R2_TRANSCRIPT_STRAND_READS  NUM_UNEXPLAINED_READS   PCT_R1_TRANSCRIPT_STRAND_READS  PCT_R2_TRANSCRIPT_STRAND_READS  PCT_RIBOSOMAL_BASES PCT_CODING_BASES    PCT_UTR_BASES   PCT_INTRONIC_BASES  PCT_INTERGENIC_BASES    PCT_MRNA_BASES  PCT_USABLE_BASES    PCT_CORRECT_STRAND_READS    MEDIAN_CV_COVERAGE  MEDIAN_5PRIME_BIAS  MEDIAN_3PRIME_BIAS  MEDIAN_5PRIME_TO_3PRIME_BIAS    SAMPLE  LIBRARY READ_GROUP
923592354   902491934   1438    0   0   0   902490496   0   0   0   0   0   0   0   0   0.000002    0   0   0   0.999998    0   0   0   0   0   0   0           

Thank you for the help!

alexdobin commented 11 months ago

Hi @annepureddy

Please check the Log.out file for mapping statistics. If reads were mapped, then the issue must be with the Picard run.