deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
410 stars 118 forks source link

How to deal with BAM with gapped alignments? #118

Open sammyjava opened 5 years ago

sammyjava commented 5 years ago

I'm having trouble figuring out how to filter my BAM so that I can use RSEM to quantify the transcriptome.

  1. I've used rsem-prepare-reference on my аssembly FASTA and GFF.
    rsem-prepare-reference --gff3 Zm-W22-REFERENCE-NRGENE-2.0.gff \
    Zm-W22-REFERENCE-NRGENE-2.0.fasta \
    ref/Zm-W22-REFERENCE-NRGENE-2.0
  2. I've built a GMAP index on the resulting FASTA:
    gmap_build -d W22-transcripts /home/shokin/genomes/ref/Zm-W22-REFERENCE-NRGENE-2.0.idx.fa
  3. I've mapped my reads against that index using GSNAP:
    gsnap -t 16 -d W22-transcripts -A sam -N 1 --gunzip \
    paired-trimmed/EmbryoSac/FC159_s_1/s_1_1_paired-trimmed.fq.gz  paired-trimmed/EmbryoSac/FC159_s_1/s_1_2_paired-trimmed.fq.gz \
    paired-trimmed/EmbryoSac/FC159_s_2/s_2_1_paired-trimmed.fq.gz  paired-trimmed/EmbryoSac/FC159_s_2/s_2_2_paired-trimmed.fq.gz \
    paired-trimmed/EmbryoSac/FC162_s_4/s_4_1_paired-trimmed.fq.gz  paired-trimmed/EmbryoSac/FC162_s_4/s_4_2_paired-trimmed.fq.gz > EmbryoSac.W22-transcripts.sam
  4. I've kept only paired reads that were mapped:
    samtools view -b -f 2 EmbryoSac.W22-transcripts.sam > EmbryoSac.W22-transcripts.f2.bam
  5. When I try to quantify those reads with RSEM I get the following:
    rsem-calculate-expression --num-threads 16 --paired-end --alignments \
    EmbryoSac.W22-transcripts.f2.bam \
    /home/shokin/genomes/ref/Zm-W22-REFERENCE-NRGENE-2.0 \
    EmbryoSac.W22-transcripts
    rsem-parse-alignments /home/shokin/genomes/ref/Zm-W22-REFERENCE-NRGENE-2.0 EmbryoSac.W22-transcripts.temp/EmbryoSac.W22-transcripts EmbryoSac.W22-transcripts.stat/EmbryoSac.W22-transcripts EmbryoSac.W22-transcripts.f2.bam 3 -tag XM
    Read HWI-EAS121:1:1:75:731#0: RSEM currently does not support gapped alignments, sorry!
    "rsem-parse-alignments /home/shokin/genomes/ref/Zm-W22-REFERENCE-NRGENE-2.0 EmbryoSac.W22-transcripts.temp/EmbryoSac.W22-transcripts EmbryoSac.W22-transcripts.stat/EmbryoSac.W22-transcripts EmbryoSac.W22-transcripts.f2.bam 3 -tag XM" failed! Plase check if you provide correct parameters/options for the pipeline!

    Any suggestions? (I've tried mapping/quantifying with the --star option, but it crashed during mapping. I could try with bowtie but I'd like to use GSNAP if I can, for other reasons.)

pliu55 commented 5 years ago

Hi @sammyjava,

Just in case if you would like to keep trying the --star option, could you please tell a little bit more about how it crashed?

Best, Peng

YANWF2020 commented 2 years ago

hello, @sammyjava, this problem was solved? I also encountered the same problem. I run : rsem-calculate-expression --alignments -p 15 --paired-end H1.bam all_gene.byrsem 4.rsem/2.rsem_cal/H1; and then "Warning: Detected a read pair whose two mates have different names--A00133:505:H32GVDSX3:1:2126:25672:24439 and A00133:505:H32GVDSX3:1:2608:13078:18004! Read A00133:505:H32GVDSX3:1:2126:25672:24439: RSEM currently does not support gapped alignments, sorry!" @pliu55 could you give me suggestions to solve it? thank you so much!

sammyjava commented 2 years ago

I think I gave up, @YANWF2020 and switched to a different method. But it's been a long time. :)