MikeAxtell / ShortStack

ShortStack: Comprehensive annotation and quantification of small RNA genes
MIT License
88 stars 29 forks source link

phasing-phasiRNA prediction #68

Closed Maja-nib closed 6 years ago

Maja-nib commented 6 years ago

Dear Dr Axtell, you developed very useful tool to identify sRNA genes. I am using it for miRNA and phasi prediction. I am wondering is there any possibility to get the information about which phasiRNA sequence originated from specific predicted PHAS loci? So far I tried to extract sequences from bam file and compared them with gff3 file (based on the mapped start position), however I am wondering if all sequences with DicerCall 21 nt + 24 nt listed in ShortStack_D.gff3 file (with -nonhp used to exclude miRNA prediction) can be considered as phasiRNAs or there could be also non-phased small RNAs mapped on the same phasi locus. Thank you for your answer. Kind regards, Maja

MikeAxtell commented 6 years ago

Hello Maja,

I don't understand your question, I'm afraid. But I'll try to answer: Yes, if a locus is significantly phased, most people would consider all siRNA that arise from it as 'phasiRNAs'

On Wed, Dec 20, 2017 at 10:02 AM, Maja-nib notifications@github.com wrote:

Dear Dr Axtell, you developed very useful tool to identify sRNA genes. I am using it for miRNA and phasi prediction. I am wondering is there any possibility to get the information about which phasiRNA sequence originated to specific predicted PHAS loci? So far I tried to extract sequences from bam file and compared them with gff3 file (based on the mapped start position), however I am wondering if all sequences with DicerCall 21 nt + 24 nt listed in ShortStack_D.gff3 file (with -nonhp used to exclude miRNA prediction) can be considered as phasiRNAs or there could be also non-phased small RNAs mapped on the same phasi locus. Thank you for your answer. Kind regards, Maja

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/68, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiXiei3bhigWxPTxCgWAmQa7r3UaELoks5tCSGhgaJpZM4RIg7N .

-- Michael J. Axtell, Ph.D. Professor of Biology Penn State University http://sites.psu.edu/axtell

Maja-nib commented 6 years ago

Thank you. I am sorry that I didnt ask clearly. After the Shortstack run one can obtain the information about PHAS loci identified from specific sRNA library (example Solyc00g005080.2.1:1-217. I have used transcriptome as input reference) but there is no information about phasiRNA (their sequences) produced from this locus. In some cases this information is also valuable. Therefore I am wondering if there is any option to obtain this information?

MikeAxtell commented 6 years ago

Hello again.

ShortStack outputs the 'MajorRNA' from each locus (the most abundant one). To get all the sRNAs in a given region, use samtools view, e.g. 'samtools view [youralignments.bam] Solyc00g005080.2.1:1-217' and parse the SAM from there.

Best, Mike

On Wed, Dec 20, 2017 at 1:12 PM, Maja-nib notifications@github.com wrote:

Thank you. I am sorry that I didnt ask clearly. After the Shortstack run one can obtain the information about PHAS loci identified from specific sRNA library (example Solyc00g005080.2.1:1-217. I have used transcriptome as input reference) but there is no information about phasiRNA (their sequences) produced from this locus. In some cases this information is also valuable. Therefore I am wondering if there is any option to obtain this information?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MikeAxtell/ShortStack/issues/68#issuecomment-353140764, or mute the thread https://github.com/notifications/unsubscribe-auth/AGiXiQaTTCj9sjfHy8Tq35PAYCN1MMrRks5tCU4lgaJpZM4RIg7N .

-- Michael J. Axtell, Ph.D. Professor of Biology Penn State University http://sites.psu.edu/axtell

Maja-nib commented 6 years ago

Great, then I was thinking in the right direction . I also filter out those that were not in output file Shortstack_D.gff3 according to your manual they are considered as RNAi-related.

Thank you for your time. All the best in your future work.

Maja