Deuterium uptake per residue rather than deltaG

Jhsmit / PyHDX

Derive ΔG for single residues from HDX-MS data

http://pyhdx.readthedocs.io

MIT License

28 stars 12 forks source link

Deuterium uptake per residue rather than deltaG #295

Closed ococrook closed 2 years ago

ococrook commented 2 years ago

HI @Jhsmit,

Is it possible to export the deuterium uptake per peptide or residue rather than delta G?

Thanks! Olly

Jhsmit commented 2 years ago

Hi @ococrook

Yes its possible. Do you want raw D-uptake values, FD corrected D-uptake values or Relative Fractional Uptake (RFU)?

The D-uptake values per peptide is the input data

And would you like to obtain the values from the web interface or python scripts?

ococrook commented 2 years ago

raw D-uptake would be great - then I can normalise myself. Happy to use python scripts.

If I understand the model correctly the D-uptake values per peptide is the input data, however, the model would also make a prediction of this quantity, by summing the uptake values across the exchangable amides within that peptide? Would be good to get an idea of how well these predictions match the input data. I can do this myself though once we have residue level uptake values.

Thanks!

Jhsmit commented 2 years ago

Hi, I've updated the docs and merged a PR addressing this (#296) Although I've just realized that in the second part I'm using FD-corrected D-uptake. But some of the discussion should apply to D-uptake as well, and you should be able to adjust things.

The updates to the docs can be found here: https://pyhdx.readthedocs.io/en/latest/examples/02_coverage_objects.html

Pls let me know if you run into any problems

ococrook commented 2 years ago

Thanks @Jhsmit

Can I check which version of python this needs to work? Do you need 3.8?

Jhsmit commented 2 years ago

Yes only 3.8 and 3.9 are supported, I recommend 3.8 as that is the version I develop on.

ococrook commented 2 years ago

awesome, thank you.

I also wanted to check that when you apply the smoothing penalty it the same as the manuscript. You include a fusion penalty between the uptake values (or delta Gs) between neihbouring residues? Just want to make sure I get the maths right so I understand what's happening

ococrook commented 2 years ago

Just going through the docs related to this. In the final chunk, you write, but I dont think I can get dimensions correct

d_calc = hdx_t.X.dot(fit_result_1.output.mean(axis=0))

Do you actually want something like the following:

out = fit_result_1.d_uptake.mean(axis = 0) 
out[np.isnan(out)] = 0
d_calc = hdx_t.X.dot(out)

If I understood correctly, you've calculate the uptakes across 20 differnet initialisation and are then taking the average?

Jhsmit commented 2 years ago

Yes that is correct, the final result is shown from the average, in the web application I also show 5/25/75/95 percentiles from all 20 (or other user specified number) repeats.

Thanks for highlighting the shapes issue, I'm currently away so I'll reopen the issue and try to look into this in more detail after September 14th

Jhsmit commented 2 years ago

awesome, thank you.

I also wanted to check that when you apply the smoothing penalty it the same as the manuscript. You include a fusion penalty between the uptake values (or delta Gs) between neihbouring residues? Just want to make sure I get the maths right so I understand what's happening

Yes that is correct. The smoothing term is the mean of the absolute value of all differences between subsequent d-uptake values: 59d14cc4035a2a84d84a24c7599398cf

Jhsmit commented 2 years ago

Just going through the docs related to this. In the final chunk, you write, but I dont think I can get dimensions correct
d_calc = hdx_t.X.dot(fit_result_1.output.mean(axis=0))
Do you actually want something like the following:
out = fit_result_1.d_uptake.mean(axis = 0) 
out[np.isnan(out)] = 0
d_calc = hdx_t.X.dot(out)
If I understood correctly, you've calculate the uptakes across 20 differnet initialisation and are then taking the average?

Thanks for spotting this, the issue was that I updated the fit result object without updating the docs (this still needs automated tests).

The fix is:

d_calc = hdx_t.X.dot(fit_result_1.result.mean(axis=0))

Where result are the raw D-uptake values as obtained by the fit, which also has guessed or interpolated values in regions without coverage.