Open rmr74370 opened 1 week ago
Hi Rachel, thanks for your message. It's hard to tell really. I definitely urge you to upgrade to the latest ShortStack. It has much better all around performance. 3.8.5 was mature code but was very conservative in calling loci true microRNA loci.
Running on super-deep data may cause strange things. I developed ShortStack largely with smaller scale experiment in mind, like 3-20 sRNA-seq libraries.
But, your sampling indicates read-depth is not the only variable. Here's one idea: Some microRNA libraries are low-quality in that they contain a lot of degraded bits of RNA. These are mostly outside of the 21-24nt size range. Could be that some of your libraries have large numbers of reads from degraded RNAs. These will affect ShortStack's calling .. any cluster that has < 80% of all alignments from reads <21 or >24 is automatically tagged as "DicerCall N", and cannot be annotated as a microRNA.
Anyway my best advice is this:
Thank you so much for the feedback! I'll definitely try out your suggestions.
In case you are upgrading be advised that I am about to drop a new version, version 4.1.0, within the next few days. The new one has many improvements, especially with speed, than the current release. So maybe wait until 4.1.0 drops to upgrade.
Sounds good, thank you!
Version 4.1.0 was just release. If using Bioconda wait a day or two for their system to catch up to the new release.
Hello! So I’m trying to use ShortStack for miRNA identification from sorghum roots samples. In order to test the efficiency of our small RNA library protocol we initially sequenced a small pool of 20 samples (which I will call Pool 1). When I ran ShortStack on them, they seemed to work fine and 37 miRNAs were identified. We then later wanted to test out more samples so we sequenced Pool 2 (consisting of 80 samples). However, this pool only yielded 12 miRNAs according to ShortStack. So I’m trying to figure out what might be causing this drastic difference between Pool 1 and Pool 2, especially since Pool 2 has more samples so I was expecting an equal or greater number of miRNAs to be identified.
Issue: Pool 2 has much fewer miRNAs identified than Pool 1 (12 miRNAs vs. 37) despite having more samples (80 samples vs. 20). Why?
Pool 1 –37 miRNAs Pool 2 –12 miRNAs
Sequencing read depth?
Pool 1&2, more than 5 million reads: 15 miRNAs Pool 1&2, more than 5 million reads, no outlier: 10 miRNAs Pool 1&2, more than 1 million reads, no outlier: 12 miRNAs
Bad sample interfering with algorithm? Sample size?
Pool 2, samples 1-20: 36 miRNAs Pool 2, samples 21-40: 0 miRNAs Pool 2, samples 41-60: 18 miRNAs Pool 2, samples 61-80: 10 miRNAs
Pool 2, samples 21-40, subset 1 (5 samples): 33 miRNAs Pool 2, samples 21-40, subset 2 (5 samples): 30 miRNAs Pool 2, samples 21-40, subset 3 (5 samples): 30 miRNAs Pool 2, samples 21-40, subset 4 (5 samples): 28 miRNAs
Additional Notes: I’ve been using version 3.8.5 since a labmate of mine used that version and I wanted to keep my results comparable to his. But I could switch to the most recent version if you think that would help. I’ve also been using all the defaults, though I have considered changing the --mincov to be something like 0.5 to increase sensitivity. I’ve also been using a Conda environment as well as the same script (just modifying which input samples) for all of the runs.
Do you have any ideas on why Pool 2 doesn’t seem to be working normally? Any help would be greatly appreciated. Thanks!