chaidiscovery / chai-lab

Chai-1, SOTA model for biomolecular structure prediction
https://www.chaidiscovery.com
Other
1.27k stars 159 forks source link

covalent modification? #110

Open Ruibin-Liu opened 1 month ago

Ruibin-Liu commented 1 month ago

Hi,

Thanks a lot for the wonderful work. I noticed that 'covalent modification' is possible to predict using chai and I know there isn't any example yet, but is there a place (or places) where I might be able to add code so that I can use the feature?

wukevin commented 1 month ago

We currently support specifying covalently modified residues by specifying their CCD code. For example, if you had some sequence ...RKSDE.. where the S is actually a covalently modified phosphoserine (https://www.rcsb.org/ligand/SEP), you can specify as input ...RK(SEP)DE.. where SEP is the CCD code for the modified residue.

Let us know if you have further questions!

Ruibin-Liu commented 1 month ago

That's great! Any plan to add something like covalent docking?

wukevin commented 1 month ago

At least for now, you'll have to build out a ConstraintContext object yourself (see chai_lab/data/dataset/constraints/constraint_context.py) for this kind of functionality.

zzhangzzhang commented 1 month ago

"Another notable limitation is that Chai-1 can be highly sensitive to modified residues. Removing modified residues from a sequence that natively posses them or replacing modified amino acids with their standard amino acid analogs can cause large changes in predicted structures. We hypothesize that this is because Chai-1 has been trained explicitly on structures with modifications and relies on this information to accurately predict structures. More plainly, those same amino acid sequences without modifications might be considered to be entirely different inputs."

Is this paragraph referring to post-translational modification? Have you tried to benchmark with structures in PDB for the same protein with and without PTM?

In this paper from 2012: Post-translational modifications induce significant yet not extreme changes to protein structure, they showed that for structures solved with phosphorylation and acetylation in PDB, there seem to be not much conformational change. "At a global level, glycosylation and phosphorylation introduce structural changes >2 Å in only 7–13% of cases. These results are similar to those observed for ligand binding where 9% of enzymes showed >2 Å structural changes" Thus, I thought the model would be biased towards not predicting conformational change induced by PTM consider the PDB database is biased towards that.

wukevin commented 1 month ago

Yes, this paragraph is referring primarily to post-translational modifications. We saw during model development and debugging that many of the more significant errors stemmed from lack of post translational modifications being specified; however, we don't have results from a comprehensive profiling of this phenomenon at this time. It's possible that given a more thorough evaluation, we might see changes on a similar scale as those you are describing.

Hope that helps clear things up!

c00jsw00 commented 4 weeks ago

Dear Sir, could u give me an example about using the constraint_context.py to simulate a covalent docking ?

wukevin commented 4 weeks ago

While we don't have a fully worked out code example for these, generating such constraints should be fairly straightforward with a bit of light digging in the code.

For example, if you wanted to ContactConstraint which specifies that a pair of tokens in a complex are within a specific distance threshold, you can provide the following fields in the corresponding constraint, supply that constraint to the ConstraintContext object, and pass that through the example folding code.

https://github.com/chaidiscovery/chai-lab/blob/1ce5045f1a927c748df090ff1abd0ba58699c6b5/chai_lab/data/features/generators/token_dist_restraint.py#L25-L36