THUNLP-MT / dyMEAN

This repo contains the codes for our paper "End-to-End Full-Atom Antibody Design"
https://arxiv.org/abs/2302.00203
MIT License
89 stars 8 forks source link

It seems that this tool generates a CDRH3 region with a high likelihood of containing many Gs. #16

Open semal opened 8 months ago

semal commented 8 months ago

image

What could possibly cause this phenomenon in the CDRH3 region?

kxz18 commented 8 months ago

According to my trials of the model, I think it might be due to out-of-distribution (OOD) test samples if the model keeps generating G and Y, especially when the CDR-H3 is long. There might be a lot of reasons for OOD. The definition of the epitope might not be suitable for the interaction pattern of antibodies. Or the epitope itself is very challenging and very different from the observed space during the model training. Trying various definitions of epitopes might be helpful as it is hard to tell whether the definition is good in advance.