lncRNA quantification - Githubissues

radio1988 commented 4 years ago

There is a new paper criticizing featureCount for not being able to quantify lncRNA expression effectively. Here is my summary of their paper:

featureCount under-estimated lncRNA expression compared with kallisto, salmon and RSEM, especially when unstranded reads used
in their test data set, lncRNAseq were having some similarity, ambiguous reads w.r.t. protein-coding genes
they did not use -M option in featureCount

Would you think featureCount can work for lncRNA quantification? Thanks!

Benchmark of long non-coding RNA quantification for RNA sequencing of cancer samples

alexdobin commented 4 years ago

Hi @radio1988

I have to read the paper more carefully, but I am not convinced by the arguments made in the paper. First, the authors do not seem to appreciate the difference between alignment methods and quantification methods. RSEM is an alignment method, and it performs as well as pseudoalignment methods. So the differences they are observing are actually owing to the quantification methods - featureCounts is the simple counting of unique mappers, while kallisto and salmon use MLE to deconvolve multimappers.

I agree that for unstranded reads and antisense lncRNA, the simple counting is not going to work well. Note, that even for unstranded reads the strand of spliced reads can be determined by the motif of the splice junctions - some quantification tools such as Cufflinks or Stringtie actually require that. I imagine featureCounts performance can also be improved by this spliced strand information.

I am also not sure why lncRNA should be affected more by the multimappers than protein coding genes. It would be interesting to check if featureCounts performance can be improved with the -M option.

Cheers Alex

DarioS commented 4 years ago

The link to the journal article doesn't work. I think the essence of your question is "Can STAR's abundance estimates be changed to use expectation-maximisation, so that a tool like RSEM would be redundant."

Do researchers still do unstranded RNA sequencing? I thought it was done a long time ago and no longer an issue.

radio1988 commented 4 years ago

Hello DarioS,

I agree with you. I feel very comfortable reading the paper while the strongest difference were observed in unstranded RNAseq data. However, they rationed that TCGA has many un-stranded RNAseq datasets and lots of researchers using featureCounts to look into lncRNA. So the topic seems still relevant in this sense.

BTW I've updated the link

alexdobin / STAR

lncRNA quantification #848