Lucy-Forrest-Lab / HDXer

HDXer is a package to compute Hydrogen-Deuterium exchange data from biomolecular simulations, compare to experiment, and perform ensemble refinement to fit a structrual ensemble to the experimental data
BSD 3-Clause "New" or "Revised" License
17 stars 6 forks source link

Issues with NaNs When Re-weighting Deuterated Fractions from MD #9

Closed yabmtm closed 11 months ago

yabmtm commented 11 months ago

I want to preface in saying that I'm much more familiar with MD than HDX, but I've recently been interested in determining structures from MD ensemble that match with HDX observables.

I've managed to create a script from the tutorial notebooks that work well for my purposes. The predicted DF from MD look good and correlate decently with experiment, but when I perform re-weighting the resulting final_segment_fractions contain a lot of NaN values.

I'm attaching the script I'm using, as well as plots of the MD vs EXPT deuterated fractions before and after re-weighting. Hoping you can help me understand where I'm going wrong. Many thanks.

Screen Shot 2023-10-13 at 1 25 17 PM Screen Shot 2023-10-13 at 1 25 25 PM

hdxer_test.txt

fabsugar commented 11 months ago

Hi, some question to clarify the issue: are the NaNs present at every value of gamma? Might be helpful if you can provide output (e.g. periteration* and other files) and input files. All the best, Fabrizio

yabmtm commented 11 months ago

Thanks for your response. Yes, the NaNs are present at every value of gamma I've tested: 0.001, 0.01, and 0.1. Here is an example output periteration* file for 0.001 where the weighted (R)MSE to target is NaN at every step:

reweighting_gamma_0.001_per_iteration_output.txt

Are there any specific input files you'd like me to attach? Otherwise I can compress and upload the directory of input/output files and attach that, removing the restart.pkl files since they total nearly 300MB

fabsugar commented 11 months ago

Does your protein have multiple chains? Currently the reweighting routine works only with single chain proteins (unless chains are identical). If this is not the issue please provide your input/output files excluding *.pkl.

yabmtm commented 11 months ago

The system is a single chain protein. I processed the trajectory to remove all non-protein atoms ahead of running calc_hdx.py.

The tarball I've attached contains the directory hdxer_test, which has all Hbonds, Contacts, and other. output of calc_hdx.py.

Within hdxer_test is also 20231012_reweighting_segments, which contains the output data from the re-weighting script.

I have removed per_iteration_output.dat files for gammas 0.1 and 1, since they were very large from the smaller step-size.

hdxer_test also contains the experimental data used for re-weighting in expt.txt.

Let me know if there's anything else I can add and thank you again. hdxer_test.tar.gz

fabsugar commented 11 months ago

Hi, I noticed that your experimental data file (expt.txt) is in % deuteration while the reweighting code takes fractions, that could be the issue. Please try to divide that by 100 and do the reweghting again, see if that removes the NaNs.

yabmtm commented 11 months ago

I divided deuteration values by 100 and find that I get much faster convergence during re-weighting. I'm still getting NaNs when using segment data, but plots look much better when using peptide data, which seems to have longer, overlapping sequences of residues.

I was surprised to see that there wasn't much difference at all between gamma values, but maybe that could just be due to undersampling in MD?

Screen Shot 2023-10-16 at 9 09 46 AM Screen Shot 2023-10-16 at 9 08 56 AM
fabsugar commented 11 months ago

Overall HDXer does not require segments, you can use the original peptide data with overlaps (which would foster a direct interpretation of the HDX data). Other than the % deuteration instead of fraction there is another issue in your expt file. HDXer ignores by default the first residue (as in HDX-MS it exchanges back to hydrogen quickly) hence it will not be able to read correctly some lines of your data file, for instance: 3 3 21.23 28.88 34.32 40.67 46.48 4 4 21.43 29.25 35.73 41.81 48.24 If you want to read a single residue e.g. residue 3 you should put "2 3 ....", please check the formats of the BPTI tutorial.
Hopefully that should get rid of the NaNs. I recommend exploring other gamma values (e.g. much bigger and much smaller). Most explanations are also in our tutorial paper: https://livecomsjournal.org/index.php/livecoms/article/view/v3i1e1521

yabmtm commented 11 months ago

Thank you very much! I think I'll consider this issue resolved and work on updating my expt file as you noted and trying a wider range of gammas.