KosinskiLab / AlphaPulldown

https://doi.org/10.1093/bioinformatics/btac749
GNU General Public License v3.0
176 stars 39 forks source link

predictions_with_good_interpae.csv #323

Closed DimaMolod closed 2 months ago

DimaMolod commented 2 months ago

Table layout

Currently the table has these headings <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

jobs | iptm_ptm | iptm | pDockQ/mpDockQ | average_interface_pae | average_interface_plddt | binding_energy | interface | Num_intf_residues | Polar | Hydrophobhic | Charged | contact_pairs | sc | hb | sb | int_solv_en | int_area | pi_score | pdb | pvalue -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --

While the last values derived from ccp4 programs are always empty: <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

interface | Num_intf_residues | Polar | Hydrophobhic | Charged | contact_pairs | sc | hb | sb | int_solv_en | int_area | pi_score | pdb | pvalue -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --

We must fix this and (partially) remove ccp4 values that are not informative (polar,hydrophobic,charged,etc?). We may also consider removing pDockQ/mpDocQ.

PyRosetta

The typical binding energies from pyRosetta are huge negative numbers expressed in Rosetta arbitrary units. These units correlate with Gibbs free energy, but to convert these numbers to more standard energy units like J/mol or kcal/mol, one must calibrate these numbers using, e.g., experimental data. As it might be too complicated and beyond the scope of AP, we can calibrate using some complexes with very well-known strong affinity (e.g. this https://www.rcsb.org/structure/1STP). Then, we can calibrate the rosetta units to the range [0,1].