eternagame / EternaFold

Improving RNA structure prediction through multitask learning on diverse crowdsourced data.
Other
49 stars 10 forks source link

BPP2SEQ format, what are "values of the unpairedness potential (derived from probing data)" #5

Open Psirving opened 1 week ago

Psirving commented 1 week ago

Describe your feedback

In the description of the BPP2SEQ file format, columns 4 through 4+N are the "values of the unpairedness potential (derived from probing data)". I'm having trouble finding info on how these are derived. Are these just normalized reactivities or pseudo-ΔGs or some other value?

Also, I'd like to use DMS-MaP data, although my understanding is that this model is trained on SHAPE-based RT-stop data. Do you have any recommendations for doing this?

Thanks! Patrick

Links

No response

HWaymentSteele commented 3 days ago

I’m pretty sure those columns are just straight up reactivities, normalization being whatever would be done for usual data workup. There were multiple columns to allow for coming from diff experiments.

My rec regarding DMS-MaP data would be to try using it anyway — in Nature methods paper there is evaluation of same model with DMS data in independent datasets, iirc it performed reasonably. Hope this helps!

Psirving commented 3 days ago

Awesome, thanks!