gcorso / DiffDock

Implementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
https://arxiv.org/abs/2210.01776
MIT License
976 stars 238 forks source link

Chain length limited to 1022 #199

Closed polo9719 closed 2 months ago

polo9719 commented 3 months ago

Hi, I realized Diffdock fails to infer complexes where the protein contains a chain having more than 1022 elements.

This limit is hard-coded here : https://github.com/gcorso/DiffDock/blob/6f5d4b152b48fc1bf2ab3e3e51cd17f29826e3c4/utils/inference_utils.py#L69

Manually increasing it to 2048 seems to fix my issue, but I was wondering if this could cause bad predictions ? What are your thoughts about it ?

Thanks you in advance, Paul

prathithbhargav commented 3 months ago

I've been experiencing a similar problem. However, I believe the reason why the truncation length is limited to 1022 is because the protein is embedded using ESM-2. According tohttps://github.com/facebookresearch/esm/issues/628, increasing the truncation length should not cause any issues, except for requiring more memory.

jsilter commented 2 months ago

@prathithbhargav is correct. With the current ESM implementation, we are limited to 1022. @polo9719 if you increase the length it will just get truncated downstream, at least for ESM embedding purposes. I wouldn't trust those predictions.