MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 29 forks source link

Degraded mRNA #92

Closed signor-molevol closed 1 year ago

signor-molevol commented 5 years ago

This may not be an appropriate question for the github issues section, however the Shortstack paper does say that for their mouse small RNA library many of the sRNAs had N dicer calls, and that this was likely due to contamination from degraded RNA.

My short RNA libraries from Drosophila are 87% N dicer calls using ShortStack.

Other papers recommend detecting contamination from degraded mRNA by looking for unusual coverage of an abundantly expressed gene such as GAPDH. While 'unusual coverage' is not defined, I did that and do not see what I would think of as unusual coverage.

(image included of reads from the bam file in the GAPDH region)

What other issues could cause mostly N calls? Is the metric that Shortstack uses to differentiate N and non-N very robust, and is there some way of evaluating it? Has it been evaluated anywhere?

Screen Shot 2019-03-18 at 1 14 43 PM

MikeAxtell commented 3 years ago

So, sorry to have ignored this for over a year ! Basically the N call is when a cluster has less than 80% of it's reads in the size range defined by the dicermin and dicermax options ... these are set at 20-24 nts by default, which makes sense for plant small RNAs. In flies, especially in reproductive tissues, you will have a lot of piRNAs, which are longer right? Perhaps these are piRNA clusters. Did you try adjusting the dicermin and dicermax settings?

The metric is robust but really depends on those a-priori assumptions about what RNA sizes are 'valid', from the dicermin and dicermax options.