bytedance / Protenix

A trainable PyTorch reproduction of AlphaFold 3.
Other
701 stars 55 forks source link

Questions about model_v1.pt vs models trained using train_demo.sh, and low performance on antibody-antigen complex #25

Open RJWANGbioinfo opened 2 weeks ago

RJWANGbioinfo commented 2 weeks ago

Hi, I have a question about the model trained. What is the difference between model_v1.pt that downloaded from https://af3-dev.tos-cn-beijing.volces.com/release_model/ (I suppose this is also the one used in the Protenix server ) versus the one trained from scratch using train_demo.sh?

I raise this question since the current prediction performance of antibody-antigen complex in protenix is way lower than AlphaFold 3 in all our testing cases. For example, the majority of plddt we get from protenix are < 70, while the same prediction obtained from AF3 usually >>80. I'm trying to figure out whether there is a way to improve/finetune the model. Thank you!

zhangyuxuann commented 2 weeks ago

Hi @RJWANGbioinfo we implement as in the AF3 suppl, it's a unnormalized version. The current pLDDT is used to select different samples of the same sequence, and the relative values are comparable (sample ranking is also effective); however, it is not applicable for selecting different structures (because the values are related to the number of contacts). image By the way, we will fix this in the next released version.

RJWANGbioinfo commented 2 weeks ago

@zhangyuxuann thanks for the clarification, I also checked the PAE matrix between Protenix and AF3, the situation is similar to the plDDT. That is why I'm trying to figure out whether this is due to the model used, or should I fine-tune the model to optimize the prediction for antigen-antibody complex