Closed jamesdalg closed 6 years ago
@jamesdalg , thanks for liking RSEM!
Yes, you can find some of the statistics at 'sample_name.stat/sample_name.cnt' file. Here is the description of that file (you can also find this file within the RSEM folder, cnt_file_description.txt):
N0 N1 N2 N_tot # N0, number of unalignable reads; N1, number of alignable reads; N2, number of filtered reads due to too many alignments; N_tot = N0 + N1 + N2 nUnique nMulti nUncertain # nUnique, number of reads aligned uniquely to a gene; nMulti, number of reads aligned to multiple genes; nUnique + nMulti = N1;
nHits read_type # nHits, number of total alignments.
0 N0
...
number_of_alignments number_of_reads_with_that_many_alignments
...
Inf N2
Hello,
Thanks for this thread. May I ask does the program take the mapping percentages or multimapping numbers into account when calculating the count matrix or normalized matrix? I didn't really get clues about how RSEM can infer and use how many genes each read mapped to. To be more specific, I used bowtie2 for alignment and then cleaned the sam file because we used UMI barcodes. I wonder if RSEM calculated that value and used it, then I should probably keep unmapped reads and etc. Thanks so much!
Best, Jie
Just to add a comment. The reason I'm struggling about this is that a large number of gene expression level are related to mapping percentages. Thus, I wonder if that is because of the normalization method. Or the mapping to some genes are smaller than the real values because of the more multi-mapping reads in that file. Looking forwards to your reply!
Thanks, Jie
Question-- is there a way to get statistics as to how RSEM performed on a certain dataset in terms of assignment percentage? I was having trouble with a particular dataset in featurecounts with assignment of reads to genes. Is there a way to get basic stats about RSEM performance (rather than aligner performance) of individual runs or experiments? Here is a basic example of just such a set of stats, below. This happens after alignment, using featurecounts. Assigned 696763 Unassigned_Ambiguity 11448 Unassigned_MultiMapping 13953741 Unassigned_NoFeatures 17772725 Unassigned_Unmapped 0 Unassigned_MappingQuality 0 Unassigned_FragementLength 13813566 Unassigned_Chimera 0 Unassigned_Secondary 0 Unassigned_Nonjunction 0 Unassigned_Duplicate 0 I really like RSEM and what it can do (very powerful!), but I'd really like to know how it performed (if there were reads that just couldn't be assigned, etc).