dincarnato / RNAFramework

RNA structure probing and post-transcriptional modifications mapping high-throughput data analysis
http://www.rnaframework.com
GNU General Public License v3.0
31 stars 11 forks source link

Performing the Siegfried scoring method on RT stop data #31

Closed MeganSylvia closed 8 months ago

MeganSylvia commented 1 year ago

Hi Danny Incarnato,

First off, I love your RNAFramework tool! I just have two questions:

I have been using RNAFramework recently with reverse transcription truncation type RNA structure probing data and have been assessing different scoring and normalization methods. I was wondering if it is possible to perform the Siegfried et al. (2014) normalization method with RT stop count data instead of only with mutational profiling data. I would love any advice you have on this topic.

Secondly, I am interested in using the rf-combine tool to combine reactivity profiles for two different chemical probes (DMS: A/C and EDC: G/T). However, since I set my nt specificity to AC for DMS and GT for EDC when performing the rf-norm step, I am unable to use this script to combine these two reactivity profiles, as all the values simply get set to NaN. I was wondering if you had any advice on combining these types of libraries.

I look forward to hearing back from you! Please let me know if I need to provide any additional details.

Megan

dincarnato commented 1 year ago

Hi MeganSylvia,

thanks a lot, always happy to hear that people enjoy using the framework. Concerning your questions:

  1. Off the top of my head, I cannot think of a reason why the Siegfried method should not work with RT-stop-based readouts, however I have never systematically evaluated this. My suggestion is to take a "positive control dataset" and see how this improves/worsens the agreement with a reference structure (both in terms of reactivities and structure prediction).
  2. You can definitely use the rf-combine tool that way, however, I must strongly advise you against it. Most likely the two reagents will require different sets of slope/intercept values in order to be converted into pseudo-free energy contributions by rf-fold, combining them won't yield anything good. Unfortunately there are no methods to combine different sets of reactivities so far, exception made for the IPANEMAP tool (https://academic.oup.com/nar/article/48/15/8276/5879432)... you might want to look into that. Also, while DMS only informs on base-pairing, EDC efficiently reacts with G and U bases in G:U wobbles, which is non-trivial to interpret at structure modeling.

Hope this helps. All the best,

Danny

MeganSylvia commented 1 year ago

Hi Danny,

I really appreciate you feedback! 1) In regards to the Siegfried scoring method, I will double check my data to ensure that it processed the RT stop counts correctly, and 2) I am definitely going to look more into this IPANEMAP tool! However, I am interested in processing the normalized reactivities without predicted folding as well, to assess the reactivity patterns themselves. I did try to combine my DMS and EDC normalized reactivities with rf-combine and it did not work. No file was generated from the program due to the NaN values in the DMS data set for G and U and the NaN values in the EDC data set for A and C. This is the output I recieved:

$rf-combine -o A_LSU_DMS+EDC_WW -ow A_LSU_DMS_WW/ A_LSU_EDC_WW/

[+] Importing input XML directories/files... 2 common transcripts. [+] Making output directory... [+] Combining reactivities [Last: none] [+] Combination statistics:

[] Combined transcripts: 0 [] Discarded transcripts: 2 total 0 XML parsing failed 2 not enough values for correlation calculation 0 correlation too low 0 mismatch between analysis tools 0 mismatch between transcript sequences 0 mismatch between scoring methods 0 mismatch between normalization methods 0 mismatch between window sizes 0 mismatch between window offsets

[+] All done.

3) Additionally, I do have a third question now (again thank you for all your help!), I was trying to use the rf-jackknife command and received the error message below. I am less familiar with perl and was unsure if this was error on my end by failing to install a dependency or something of the sort:

$ rf-jackknife -r ../d.23.e.O.sativa.bracket.txt -p 7 -x -kn -kl -ow -o A_LSU_DMS_DS_jackknife A_LSU_DMS_DS

[+] Making output directory... [+] Checking input reference structures and probing data [0 imported] [!] Warning [Data::IO::Sequence->new()]: Unable to guess file format. Falling back to generic text file. -> Caught at /usr/local/biocore/Bio_Programs/RNAFramework/lib/Core/Base.pm line 117. Can't locate object method "new" via package "Data::IO::Sequence::" (perhaps you forgot to load "Data::IO::Sequence::"?) at /usr/local/biocore/Bio_Programs/RNAFramework/lib/Data/IO/Sequence.pm line 56.

Again thank you for all your help!

Megan

dincarnato commented 1 year ago

Hi Megan,

you are right, I forgot that we changed the logic of the rf-combine algorithm some time ago. There is no immediate way to do this unfortunately, but I have modified the rf-combine module for you by adding the "-i" parameter. This should allow you to combine XML files with different sets of reactive bases. I am attaching it here and I will release it in the next release of RNAFramework.

rf-combine.zip

Concerning the rf-jackknife module, it looks like the d.23.e.O.sativa.bracket.txt is not properly formatted, which makes RNAFramework guess the wrong file format. Can you please share that file with me so that I can check what's wrong with it?

Thanks! All the best,

Danny

dincarnato commented 1 year ago

Hi Megan,

did you solve this? Do you want to share the dot-bracket file?

Best, Danny

MeganSylvia commented 1 year ago

Hi Danny,

I believe I did. I switch back and forth between my linux and windows computer and sometimes my text files get encoded incorrectly. I'll let you know if I have any further issues and thank you so much for your help!!!

Megan

On Tue, May 2, 2023 at 9:35 AM Danny Incarnato @.***> wrote:

Hi Megan,

did you solve this? Do you want to share the dot-bracket file?

Best, Danny

— Reply to this email directly, view it on GitHub https://github.com/dincarnato/RNAFramework/issues/31#issuecomment-1531491274, or unsubscribe https://github.com/notifications/unsubscribe-auth/ASMX7WSW2DBM2CT32XXH2C3XEEESHANCNFSM6AAAAAAXJ5ASCQ . You are receiving this because you authored the thread.Message ID: @.***>

dincarnato commented 1 year ago

Yes, non-Linux line endings can cause the issue.