chaidiscovery / chai-lab

Chai-1, SOTA model for biomolecular structure prediction
https://www.chaidiscovery.com
Apache License 2.0
1.36k stars 162 forks source link

How to understand output json/npz file? What are pTM, ipTM, clashes? #76

Open WillHua127 opened 2 months ago

WillHua127 commented 2 months ago

How do i understand the scores, like aggregate_score, ptm, iptm in the json file?

{
  "aggregate_score": 0.4387286305427551,
  "ptm": 0.7986587285995483,
  "iptm": 0.34874609112739563,
  "per_chain_ptm": [
    [
      0.8791998624801636,
      0.4463137984275818
    ]
  ],
  "per_chain_pair_iptm": [
    [
      [
        0.8791998624801636,
        0.34874609112739563
      ],
      [
        0.10360758751630783,
        0.4463137984275818
      ]
    ]
  ],
  "has_inter_chain_clashes": false,
  "chain_intra_clashes": [
    [
      0,
      0
    ]
  ],
  "chain_chain_inter_clashes": [
    [
      [
        0,
        0
      ],
      [
        0,
        0
      ]
    ]
  ]
}
arogozhnikov commented 2 months ago

pTM, ipTM: these scores correspond to those in AF, see their documentation, those are targeting to predict TM score in the absence of reference

clashes: cases when predicted atom positions are too close (i.e. atoms clash). Your input had two chains (or chain and ligand, which will be interpreted as chain in this case). Clashes are computed within chain or when comparing two chains.

aggregated score: see discussion https://github.com/chaidiscovery/chai-lab/discussions/70

arogozhnikov commented 1 month ago

I'll leave this issue open so other users could find it easily

stianale commented 1 month ago

pTM, ipTM: these scores correspond to those in AF, see their documentation, those are targeting to predict TM score in the absence of reference

clashes: cases when predicted atom positions are too close (i.e. atoms clash). Your input had two chains (or chain and ligand, which will be interpreted as chain in this case). Clashes are computed within chain or when comparing two chains.

aggregated score: see discussion #70

I notice you did not implement the chain pair pae min (CPPM) in the output? Would perhaps have been helpful.

arogozhnikov commented 1 month ago

Would perhaps have been helpful.

Can you give an example when it is helpful and why would you prefer it over other metrics?

stianale commented 1 month ago

Would perhaps have been helpful.

Can you give an example when it is helpful and why would you prefer it over other metrics?

CPPM can help distinguish binding partners from molecules that do not.

arogozhnikov commented 1 month ago

CPPM can help distinguish binding partners from molecules that do not.

Can you give me some reference for this? Also, PAE min or PAE mean?

stianale commented 1 month ago

CPPM can help distinguish binding partners from molecules that do not.

Can you give me some reference for this? Also, PAE min or PAE mean?

I saw it on the FAQ of the AlphaFold server, under how to interpret their metrics. Not sure what source they used for this, and I am by no means an expert here. It is chain pair pae min. The lower the value, the better.

stianale commented 1 month ago

CPPM can help distinguish binding partners from molecules that do not.

Can you give me some reference for this? Also, PAE min or PAE mean?

Sorry, it is chain pair pae min: "chain_pair_pae_min: A [num_chains, num_chains] array. Element (i, j) of the array contains the lowest PAE value across rows restricted to chain i and columns restricted to chain j. This has been found to correlate with whether two chains interact or not, and in some cases can be used to distinguish binders from non-binders." - https://alphafoldserver.com/faq#how-do-i-interpret-all-the-outputs-in-the-downloaded-json-files