Closed TBrownmiller closed 3 months ago
Hi @TBrownmiller, thanks for your request. RENEE outputs both gene and isoform counts -- the isoform count matrix is DEG_ALL/RSEM.isoforms.expected_count.all_samples.txt
. Is this what you're looking for?
Sort of. I think its a difference in formatting of the outputs. One of the R packages I use (EBSeq) that is usually directly compatible with RSEM outputs asks for a matrix file (file extension ".MATRIX") format, but the RENEE generated outputs are either a txt or tsv format which aren't directly compatible.
Gotcha. We'll make this available in the next release of RENEE -- v2.6.
Hello @TBrownmiller ,
Just to follow up on your enquiry, I was wondering what error you receive when you try using the RSEM.isoforms.expected_count.all_samples.txt
file in EBSeq?
From the package vignette, it says, The object Data should be a G − by − S matrix containing the expression values for each gene and each sample, where G is the number of genes and S is the number of samples. These values should exhibit raw counts, without normalization across samples.
And the RSEM.isoforms.expected_count.all_samples.txt
output file looks like:
gene_id GeneName transcript_id sample1 sample2
ENSG00000277411.1 5S_rRNA ENST00000614916.1 0.0 0.0
ENSG00000273730.1 5_8S_rRNA ENST00000619779.1 11.88 12.62
...
ENSG00000268895.6 A1BG-AS1 ENST00000595302.1 33.26 12.7
ENSG00000268895.6 A1BG-AS1 ENST00000594950.5 0.0 0.0
ENSG00000268895.6 A1BG-AS1 ENST00000593960.6 21.17 10.29
You can create a new data matrix with just rownames and expression values. Try the following code:
library(dplyr)
library(tibble)
library(EBSeq)
df <- read.table("RSEM.isoforms.expected_count.all_samples.txt", header=T)
gene.matrix <- df %>%
mutate(gene=paste(gene_id,GeneName,transcript_id, sep="_") %>%
select(-c(gene_id,GeneName,transcript_id)) %>%
column_to_rownames("gene") %>%
as.matrix()
The gene.matrix
should work with EBSeq.
Let us know if that works.
@samarth8392 thanks for posting the R code to transform the count table into a matrix.
Vishal and I discussed this issue and decided to go ahead and add a rule to create the matrix with rsem -- it runs very quickly and doesn't add much overhead at all. This way our users won't have to transform the other output themselves. See https://github.com/CCBR/RENEE/pull/149
Hello,
Would it be possible to add to the RSEM its feature to generate a counts matrix for the isoforms data? Within the RSEM documentation it states it has the ability to do this using the following command: rsem-generate-data-matrix sampleA.[genes/isoforms].results sampleB.[genes/isoforms].results ... > output_name.counts.matrix
I was able to do this manually by loading RSEM as a module in biowulf so there is no rush for this, but I thought it would be useful since many downstream tools use count matrices
Thanks!