deweylab / RSEM

RSEM: accurate quantification of gene and isoform expression from RNA-Seq data
http://deweylab.biostat.wisc.edu/rsem/
GNU General Public License v3.0
408 stars 118 forks source link

Transcript-to-gene-map and NgVector #5

Closed standage closed 8 years ago

standage commented 11 years ago

The rsem-prepare-reference script allows you to specify the mapping of transcripts to genes. The rsem-run-ebseq script utilizes this mapping information to do an isoform-level gene expression analysis, but requires you to provide this information as an "ngvector" file. I looked at the files generated by the rsem-prepare-reference script, but none of them look like an ngvector file.

I see the documentation for the rsem-generate-ngvector script, but it seems to compute its own mapping rather than use the one provided with the --transcript-to-gene-map option in rsem-prepare-reference.

Am I missing something? If I already have the mapping of transcripts to genes, how do I provide this information to rsem-run-ebseq?

standage commented 10 years ago

I just ran rsem-generate-ngvector on a dummy set of transcripts. The format looks similar to the reference_name.grp file generated by rsem-prepare-reference. Is this the ngvector file?

bli25wisc commented 10 years ago

Hi Daniel,

Sorry for my late reply. I'm pretty busy these days.

For your question, you should use 'rsem-generate-ngvector' to compute a ngvector and feed it into 'rsem-run-ebseq'. 'rsem-generate-ngvector' groups isoforms according to their sequence unmappability (the portion of positions that cannot be mapped uniquely), and Ning had performed some simulation experiments that suggested the 'rsem-generate-ngvector' generated ngvector is at least as good as using the transcript-to-gene mapping.

To get an ngvector, you just need to follow 'rsem-generate-ngvector' 's document. Alternatively, you can use transcript-gene mapping to generate a ngvector by yourself (refer to EBSeq's manual for how to do this).

Hope it helps, Bo

On 2013-09-18 19:42, Daniel Standage wrote:

The rsem-prepare-reference script allows you to specify the mapping of transcripts to genes. The rsem-run-ebseq script utilizes this mapping information to do an isoform-level gene expression analysis, but requires you to provide this information as an "ngvector" file. I looked at the files generated by the rsem-prepare-reference script, but none of them look like an ngvector file.

I see the documentation for the rsem-generate-ngvector script, but it seems to compute its own mapping rather than use the one provided with the --transcript-to-gene-map option in rsem-prepare-reference.

Am I missing something? If I already have the mapping of transcripts to genes, how do I provide this information to rsem-run-ebseq?

Reply to this email directly or view it on GitHub [1].

Links:

[1] https://github.com/bli25wisc/RSEM/issues/5

standage commented 10 years ago

So what information is stored in the reference_name.grp file?

bli25wisc commented 8 years ago

RSEM arranges transcripts by first its parent gene's name and then its name. *.grp file describes a cumulative sum of number of transcripts for each gene. We can use this information to determine the number of transcripts belonging to each gene and their start and end positions in the reference.