MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 29 forks source link

Strandness of PhasiRNAs #100

Closed Stefano192 closed 1 year ago

Stefano192 commented 4 years ago

Dear Mike,

I am currently having an issue with how Shortstack defines strandness in the data. I would like to plot sRNA loci based on strandness, however most of them show a "." instead of a "+" or "-". I was using the default 0.8 cutoff and then I used also a 0.6 cutoff but all of the ones with high Phase scores were still with a "." strand. How could this issue be fixed?

Best wishes, Stefano Amantia

MikeAxtell commented 4 years ago

Hello Stefano,

Sorry for slow response. Just back from vacation.

‘.’ strand is a valid entry in GFF3 format and denoted unstranded features. When it appears in a ShortStack-generated annotation it is because the alignments to both genomic strands are about equal, subject to the cutoff that you note below. I can’t recall exactly but it may be that phasing scores are only generated for features with strand of ‘.’ , because phasing is usually thought of as a property of siRNA loci, which should have strand of ‘.’

Michael J. Axtell, Ph.D. Professor of Biology The Pennsylvania State University https://sites.psu.edu/axtell https://plantsmallrnagenes.science.psu.edu

On Jul 30, 2020, at 7:01 AM, Stefano192 notifications@github.com wrote:

Dear Mike,

I am currently having an issue with how Shortstack defines strandness in the data. I would like to plot sRNA loci based on strandness, however most of them show a "." instead of a "+" or "-". I was using the default 0.8 cutoff and then I used also a 0.6 cutoff but all of the ones with high Phase scores were still with a "." strand. How could this issue be fixed?

Best wishes, Stefano Amantia

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Stefano192 commented 4 years ago

Dear Mike,

Sorry for the slow response as well. Started to work again myself.

The only issue is that I am trying to plot the distribution of the PhasiRNAs with highest phase score by "+" and "-" strand. Besides lowering the strandness cutoff, which I did but with no improvement. So, if the phasing scores are only generated for features with strand of ‘.’ , then there is no way to have information about the strandness of Phas loci?

Best wishes, Stefano Amantia

MikeAxtell commented 3 years ago

The Results.txt file has a column called 'FracTop' which lists the fraction of alignments mapped to the + genomic strand. This will be a float between 0 and 1. You could filter on this I think? Note that this is a different issue then calling the 'Strand' of a locus, which is ultimately based on user-defined thresholds.

Stefano192 commented 3 years ago

Thank you so much Mike for your answer. I will also look into the FracTop column to check how many alignments are mapped based on strand.

Thank you for your availability Stefano Amantia