Question Regarding Data Preprocessing for train.lmdb and valid.lmdb from MOAD in docking_v2 Directory

deepmodeling / Uni-Mol

Official Repository for the Uni-Mol Series Methods

MIT License

721 stars 126 forks source link

Question Regarding Data Preprocessing for train.lmdb and valid.lmdb from MOAD in docking_v2 Directory #259

Open iceissey opened 3 months ago

iceissey commented 3 months ago

I would like to know how the train.lmdb and valid.lmdb files in the docking_v2/protein_ligand_binding_pose_prediction_v2 directory were processed from the MOAD dataset. I have checked the code and found only the data preprocessing during inference, which generates conformations for ligand molecules. Could you please clarify if the preprocessing during training is the same as during inference?

ZhouGengmo commented 2 months ago

The data preprocessing during training is consistent with that during inference, except that the conformation clustering is fixed during training, with M = 100 and N = 10.

Additionally, there are some duplicate data in the original training data and we do not apply special handling for this. We will release the processed LMDB files later.