Lucy-Forrest-Lab / HDXer

HDXer is a package to compute Hydrogen-Deuterium exchange data from biomolecular simulations, compare to experiment, and perform ensemble refinement to fit a structrual ensemble to the experimental data
BSD 3-Clause "New" or "Revised" License
14 stars 6 forks source link

Std. Dev. across trajectory blocks #1

Open rupeshagarwal opened 2 years ago

rupeshagarwal commented 2 years ago

Hey, I am using multiple simulation trajectories as input but I don't see the "Std. Dev. across trajectory blocks" in the output 'Deuteration fraction vs. Time' plots?

rtb1c13 commented 2 years ago

Hi @rupeshagarwal, it's been a while since I looked at the plotting code but I believe the default script behaviour with multiple trajectories is to concatenate them together, which doesn't calculate error bars for the D_f plots as block averages. You can switch on the block averaging using the -c flag to calc_hdx.py, with whatever block size you choose. For example, if each trajectory is an independent repeat and 10000 frames long, you could run calc_hdx.py ... -c 10000 ...

Does that solve your problem? Apologies if the plot legend is confusing, I can work on removing that legend entry in the event the block averages aren't calculated.

rupeshagarwal commented 2 years ago

Thanks. It worked. What kind of correlation (R-value in seg_curves.pdf ) with the experiment should be considered good?

rtb1c13 commented 2 years ago

Glad to hear that worked. As you might expect the correlation of predicted and experimental HDX can be very varied, depending on the predictive model, the structures in the simulation ensemble, and of course the experiments themselves - hence the rationale for developing ensemble reweighting approaches. I can only suggest looking over published studies on similar systems to yours to know what to expect.

You can also extend the (rather basic) analyses & plots that we provide to investigate the data further - the output .dat files provide the intrinsic rates, protection factors, predicted deuterated fractions per timepoints, etc. etc. Hope that helps.

rupeshagarwal commented 2 years ago

Thanks. Another related question: What is the difference between Predicted fraction, R which is calculated in seg_curve pdf and R-squared calculated using linregress? For BPTI example, the values of Predicted fraction R in seg_curve pdf are very high except 0.166 but the R-squared calculated is 0.59.

rtb1c13 commented 2 years ago

The R^2 in the BPTI example notebook is calculated across all the timepoints combined, whereas the r-values in seg_curves.pdf are the Pearson's correlation between predicted and experimental deuteration for each individual timepoint separately. The former can be helpful as a simple description of the predictive model quality as a whole, while the latter can help identify if any individual timepoint predictions exhibit better or worse correlation with experiment.

Again, they're very basic descriptors of the data though, feel free to explore the predictions however you wish! 🙂

rupeshagarwal commented 2 years ago

Hi Richard,

Thanks for the response. I appreciate it very much. I have one more question: If the experiment HDX was performed as a complex and the MD simulation I have is of a complex. However, due to size if I just chop one subunit from the original trajectory and create a trajectory to run HDXer. Will my result be affected when compared to using both subunit trajectories?

Rupesh

On Mon, May 16, 2022 at 6:29 PM Richard Bradshaw @.***> wrote:

The R^2 in the BPTI example notebook is calculated across all the timepoints combined, whereas the r-values in seg_curves.pdf are the Pearson's correlation between predicted and experimental deuteration for each individual timepoint separately. The former can be helpful as a simple description of the predictive model quality as a whole, while the latter can help identify if any individual timepoint predictions exhibit better or worse correlation with experiment.

Again, they're very basic descriptors of the data though, feel free to explore the predictions however you wish! 🙂

— Reply to this email directly, view it on GitHub https://github.com/Lucy-Forrest-Lab/HDXer/issues/1#issuecomment-1128199545, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADI5EMCGPZ3ZP3AFMPCLN7LVKLD3JANCNFSM5SGTVT2Q . You are receiving this because you were mentioned.Message ID: @.***>

rtb1c13 commented 2 years ago

Potentially yes - the residues at the interface won't be protected by the contacts & H-bonds from the neighbouring protomer, so their predicted protection factors will be lower, and exchange faster, than in the full complex. That might be a big or small effect, it really depends on the size of the interface & inter-protein contacts!

If you're not interested in the interface however, or have no HDX-MS coverage there experimentally, you could simply exclude those experimental peptides from your predictions. The contacts definition in the Best-Vendruscolo predictive model is fairly short (default 6.5 Ã…), so beyond that distance from the interface the predicted HDX won't be affected by the neighbouring protein. Hope that helps.