BioinfoUNIBA / REDItools

REDItools are python scripts to investigate RNA editing at genomic scale.
MIT License
57 stars 35 forks source link

How to infer strand properly for stranded RNA library? #10

Open wendy517 opened 3 years ago

wendy517 commented 3 years ago

I am struggling about how to infer strand information with REDItools by using -s parameter for stranded RNA-seq. For sure by choosing different -s parameter [0/1/2/12] for one sample, the substitutions result will shift. But among replicate samples, when using the same -s parameter, the nucleotide base in the reference genome and the strand information results came out differently and I don't know why.

flde commented 11 months ago

Dear all,

Many thanks for the great tool! I want to second @wendy517 questions. I run REDItoolDnaRna.py with RNA-seq data, the ENSEMBL reference genome, and with -s 2 -g 2 -S as described in the nature methods. Everything worked perfectly but the results are +, -, stranded. Two questions, (i) in theory there could be overlapping transcripts on opposing strands. In case there is an editing site on both mRNA on the same position - just opposite strand - would REDItool be able to distinguish them? And (ii) is the strand set to if the inference of -g 2 fails?

I used the NEBNext Ultra II Directional RNA Library Prep Kit were the reads should map to the opposite strand from which they originate - I guess. In that case I also have to reverse the strand of the resulting editing site?

I would highly appreciate insights from your side.

Many thanks!

Best wishes, Florian

flde commented 11 months ago

I am sorry, but I can't sort it out. From what I see the REDItools option fits my RNA-seq assay and that yields stranded information about the editing. However, if I want to detect mRNA A-to-I editing, should I select A-to-G on the "+" and "-" strand or T-to-C events on the "-"?

And how is the reference base and editing event determined in case the strand was not infered? I find entries with undecided strand but statments about the the reference base and substitution.

I would really appreciate your help here.

Best wishes, Florian

evchambers commented 7 months ago

Hi All,

I am also running into issues with inferring the correct strand identity when using REDItoolKnown on some RNA-seq data. I am using the rediportal database as reference and I am having weird cases where an edit site is identified at

20 4819100 A -

in my RNA-seq data, but the rediportal database has the same site as

20 4819100 T -

I presume the strandedness has been incorrectly identified in the RNA seq data as I would expect the A to G edit to be on the + stand? I used infer_experiment.py to work out the strandedness in my RNA-seq data and used the following flags -s 2 -g 2 -S as recommended. Any further insight greatly appreciated.