Jhsmit / PyHDX

Derive ΔG for single residues from HDX-MS data
http://pyhdx.readthedocs.io
MIT License
28 stars 12 forks source link

rfu_residues_sd error #348

Closed tuttlelm closed 2 weeks ago

tuttlelm commented 3 weeks ago

I have generated an HDX Measurement object from my data. (Naively) Looking at the code it seems like there should be both a .rfu_residues field and .rfu_residues_sd field for this object, but trying to access hdxm.rfu_residues_sd generates an error (summarized below). The rfu_residues values are generated and are sensible. Is there a way to generate the error bars on the rfu_residues values?

Thanks for your work on this repository!

:
:
File ~/.local/lib/python3.10/site-packages/pyhdx/models.py:625, in HDXTimepoint.rfu_residues_sd(self)
    [621](pyhdx/models.py:621) @property
    [622](pyhdx/models.py:622) def rfu_residues_sd(self) -> pd.Series:
    [623](pyhdx/models.py:623)     """Error propagated standard deviations of RFU per residue."""
--> [625](pyhdx/models.py:625)     return self.propagate_errors("rfu_sd")
:
:
KeyError: 'rfu_sd'
Jhsmit commented 2 weeks ago

Hi, thanks for opening the issue

Normally you do have sd values, they are calculated from the sd values associated with the d-uptake values ('uptake_sd'), That typically happens in the apply_control function here

Depending on how you've created the HDXMeasurement object, you might not have this field. This template shows the typical steps: https://github.com/Jhsmit/PyHDX/blob/master/templates/01_load_secb_data.py

Hope that helps! Please reach out if you have additional issues

tuttlelm commented 2 weeks ago

Thanks so much for your response. I've got a patchwork conversion script for starting from HDExaminer outputs and I had used "rfu sd" instead of "rfu_sd". It is working now.

Related to coming from HDExaminer data (and I can open a separate issue for that topic if that would be more appropriate), pyHDX currently does not allow duplicate measurements when creating the HDXMeasurement object. As far as I can tell, having replicates isn't an issue for any of the downstream calculations, but I wondered if you had thoughts on that. I was able to make some simple modifications to models.py so that I can leave replicates in my data and not have to replicate average it first (basically just data.reset_index() in the init() function and add "index" as a column where you are sorting or pivoting on the columns)

Thanks again

Jhsmit commented 2 weeks ago

Yes, let discuss on #349