Open etrh opened 2 years ago
Hi @etrh,
It is do-able to run STAR manually and provide the transcriptome BAM to rsem-calculate-expression. You can try the --bam
option for rsem-calculate-expression. The ENCODE RNA-seq pipeline has an example with more details on this:
rsem-calculate-expression --bam --estimate-rspd --calc-ci --seed ${rnd_seed} -p $ncpus \
--no-bam-output --ci-memory 30000 ${extra_flags} $anno_bam ${index_prefix} ${bam_root}_rsem
The $anno_bam
is the transcriptome BAM you got from STAR.
Sorry @etrh , I can’t participate in this discussion - I’m a little short on time.
Thank you @pliu55 ( and also @RamRS, I completely understand, no worries :-) )
That makes sense, however my question is regarding STAR index
specifically and the proper way of building it if I want to pass my resulting transcriptome BAM to RSEM.
Specifically I wish to know whether while building the STAR index I can use the genome file that I download from Ensembl/GENCODE? Or should I specifically first build the RSEM index and then take the reference_name.idx.fa
that rsem-prepare-reference
creates and build my STAR index based on that file (i.e. reference_name.idx.fa
)? This seems to be what the documentation suggests (https://github.com/deweylab/RSEM#using-an-alternative-aligner)
Hi @etrh,
I am not sure if I understand your question correctly. In principle, the preparation for STAR and RSEM reference is independent as long as the same gene annotation and genome sequence files are used. I don't think reference_name.idx.fa
from RSEM is required to build STAR reference.
@pliu55 I'm specifically referring to this text from the manual:
To use an alternative alignment program, align the input reads against the file reference_name.idx.fa generated by rsem-prepare-reference
Am I misunderstanding something here? To me it sounds like the text above specifically expects the STAR index to be generated from the FASTA that rsem-prepare-reference generates.
@bli25 / @alexdobin Would you happen to know the correct approach here when using STAR + RSEM? Any help would be greatly appreciated.
I have gone through several pipelines and tutorials online but I haven't been able to figure out whether the genome should first go through rsem-prepare-reference and then we should use the resulting reference_name.idx.fa to generate the STAR index.
Hi @etrh
Not sure if I can help here. I use STAR+RSEM pipeline without calling STAR from RSEM. Rather, I generate a genome index and map with STAR, and then use STAR's BAM as RSEM's input. This allows more flexibility.
Cheers Alex
Hi again @alexdobin
Thank you! This is extremely helpful information.
Incidentally, are you aware whether RSEM needs the BAI
file alongside the transcriptome.bam
? Or RSEM doesn't utilize the BAM index (BAI) at all?
Hi @etrh
the transcriptome.bam file that RSEM uses is not sorted by coordinate, so .bai file is not needed.
Best, Alex
Hi @alexdobin , I used the same pipeline with you, and I set the parameter-SortedByCoordinate ,then I got two types of BAM files , called .toTranscriptome.out.bam and .sortedByCoord.out.bam , which one do you used as the input file for RSEM? Eagerly looking forward to your help! Thank you!
Hi @edceeyuchen
RSEM needs the *.toTranscriptome.out.bam file.
Sorry for my late reply!
And Thank you for your timely help! @alexdobin
I am trying to run STAR manually and then provide the transcriptome BAM to RSEM Calculate Expression. I just find the documentation a bit confusing and I am not sure if I'm doing everything correctly.
Here is what the documentation says:
Does this mean that I need to generate my STAR index using the
reference_name.idx.fa
that rsem-prepare-reference returns? (instead of using the same genome file that I downloaded from Ensembl or GENCODE and provided directly to rsem-prepare-reference?)cc: @pliu55 @RamRS