chaidiscovery / chai-lab

Chai-1, SOTA model for biomolecular structure prediction
https://www.chaidiscovery.com
Other
1.02k stars 131 forks source link

Making CIF metrics usable #52

Open arogozhnikov opened 4 days ago

arogozhnikov commented 4 days ago

We recently switched from PDB > CIF with two main motivations:

We followed recommendation to use modelcif, and specifically _ma_qa_metric, but I don't see a way to color by this field in pymol; and don't see pairwise metrics in pymol either.

I need help figuring out what's the right pipeline for users.

arogozhnikov commented 4 days ago

@aozalevsky @brindakv recommendations are very welcome

aozalevsky commented 3 days ago

afaik, direct support of _ma_qa_metric is implemented in Mol* (which you already know since you're using it in your web service) and ChimeraX. In Mol* pLDDT is automatically recognized as one of the Validation attributes and colored according to pLDDT Confidence colorscheme. In ChimeraX you can color a model using the following command:

color by pLDDT_score atoms palette alphafold

where pLDDT comes from a _ma_qa_metric and _score is added later in the code as a label.

All scripts for pLDDT coloring for PyMOL rely on the _atom_site.B_iso_or_equiv (or simply B-factors) field since it's been a common (though hacky) practice. DeepMind is also populating both _ma_qa_metric and _atom_site.B_iso_or_equiv for backwards compatibility. Seems that you also opted for a similar strategy.

As for PyMOL, seems like only the minimal useful information from PDBx/mmCIF and BinaryCIF is extracted. I wonder if @JarrettSJohnson would be willing to chime in. Would be really nice to have at least per-residue (local) scores from _ma_qa_metric supported in all major visualization packages.

aozalevsky commented 3 days ago

As for PAE, it's already supported in the ModelCIF as _ma_qa_metric_local_pairwise, but judging from the repos, none of the aforementioned tools support reading it from the CIF file. ChimeraX already has PAE visualization (as downloaded from AFDB) tools in place. Maybe @tomgoddard can tell if the current _ma_qa_metric metric support can be further extended for reading and visualizing pairwise (PAE) scores.

The low-level support for ModelCIF (including reading/writing metrics) is already implemented in the https://github.com/ihmwg/python-modelcif

tomgoddard commented 3 days ago

Showing _ma_qa_metric_local_pairwise in ChimeraX is not related to the capability added by @e-pettersen in April 2023 to show per-residue attributes from _ma_qa_metric since a residue-pair score would be handled entirely differently than a score on individual residues. But I have used the AlphaFold PAE plot to show residue-residue distances as shown here

 https://rbvi.github.io/chimerax-recipes/rrdist/rrdist.html

That code just rewrote the distances in AlphaFold PAE JSON file format and read the file. I could possibly try some Python code in ChimeraX that reads the _ma_qa_metric_local_pairwise table from a ModelCIF entry and show it with the AlphaFold PAE plot in ChimeraX. Could you point me to a ModelCIF file that has that table? The dictionary description of the table suggests it has all the information needed to display like PAE data.