MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 29 forks source link

Why some miRNAs members are lacking when MIRNA search is set? #65

Closed DiegoZavallo closed 6 years ago

DiegoZavallo commented 6 years ago

Hi Mike,
I'm using shortstack in a project with two conditions (infected and mock) which they could have some differences in the accumulation and in the ratio of miRNA/miRNA* as well as some precursor missprocesing. That is why it came very handy the MIRNA folder that it's generated when I run it without the --nohp option. However I notice that for some reason it didn't generate all the miRNA families (neither all the members) despite that the in the Results.txt files I find them. For example, for the miRNA156 I found this counts in the Results.txt file:

image

but when I look for all the members of that family in the MIRNA folder I only found miRNA a,b,c,e and g, but not d or f. I thought that maybe it's because low count reads, but others families have much higher count reads and they are still lacking.

Is there a threshold that it may be modify so that all the miRNAs appear in the MIRNA folder? Or is it something else that I'm not seeing?

Thanks

Best Diego

MikeAxtell commented 6 years ago

Hello Diego, thanks for your message.

According to the output you sent me, none of the loci would be in the MIRNA folder. It looks like the image you sent comes from a --nohp run, where MIRNAs are not called.

ShortStack only creates output in the MIRNA directory for loci that pass all of it's filters and have a value of 'Y' in the 'MIRNA' column of Results.txt. Probably d and f didn't pass all the checks in your run with your sRNA-seq data.

Best, Mike

On Thu, Oct 26, 2017 at 11:10 AM, DiegoZvallo notifications@github.com wrote:

Hi Mike, I'm using shortstack in a project with two conditions (infected and mock) which they could have some differences in the accumulation and in the ratio of miRNA/miRNA* as well as some precursor missprocesing. That is why it came very handy the MIRNA folder that it's generated when I run it without the --nohp option. However I notice that for some reason it didn't generate all the miRNA families (neither all the members) despite that the in the Results.txt files I find them. For example, for the miRNA156 I found this counts in the Results.txt file:

[image: image] https://user-images.githubusercontent.com/29485786/32060461-2d84ce14-ba45-11e7-9414-d5d35d6a6739.png

but when I look for all the members of that family in the MIRNA folder I only found miRNA a,b,c,e and g, but not d or f. I thought that maybe it's because low count reads, but others families have much higher count reads and they are still lacking.

Is there a threshold that it may be modify so that all the miRNAs appear in the MIRNA folder? Or is it something else that I'm not seeing?

Thanks

Best Diego

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/65, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiXieHldssAY6eYL_thQ2xommjwcEiAks5swKDvgaJpZM4QHvEb .

-- Michael J. Axtell, Ph.D. Professor of Biology Penn State University http://sites.psu.edu/axtell

DiegoZavallo commented 6 years ago

Mike, thanks for your response You are right, that screenshot correspond to a previous analysis, but afterwards I did it without the --nohp. What are those filters that the features should pass to have a 'Y'? Is there a way to make those checkpoint less stringe astringent? Don't you think it's strange that some isoforms pass the filters and some doesn't? Best

Diego

MikeAxtell commented 6 years ago

Nope, the settings for MIRNA discovery are hard-coded, and deliberately strict, to prevent false positives.

No, it's not strange to me that some are missed. Some, despite being annotated, may not be real. Others may not be expressed in a particular tissue. And there is the issue of multi-mapped reads because the same mature miRNA sequence can get attracted to many different loci during alignment.

Hope that helps.

On Fri, Oct 27, 2017 at 10:32 AM, DiegoZvallo notifications@github.com wrote:

Mike, thanks for your response You are right, that screenshot correspond to a previous analysis, but afterwards I did it without the --nohp. What are those filters that the features should pass to have a 'Y'? Is there a way to make those checkpoint less stringe astringent? Don't you think it's strange that some isoforms pass the filters and some doesn't? Best

Diego

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/65#issuecomment-339988470, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiXidXUAxuYYuhtS3EgXG0eqPx21tGNks5swelmgaJpZM4QHvEb .

-- Michael J. Axtell, Ph.D. Professor of Biology Penn State University http://sites.psu.edu/axtell