Open iceissey opened 3 months ago
The data preprocessing during training is consistent with that during inference, except that the conformation clustering is fixed during training, with M = 100 and N = 10.
Additionally, there are some duplicate data in the original training data and we do not apply special handling for this. We will release the processed LMDB files later.
I would like to know how the train.lmdb and valid.lmdb files in the docking_v2/protein_ligand_binding_pose_prediction_v2 directory were processed from the MOAD dataset. I have checked the code and found only the data preprocessing during inference, which generates conformations for ligand molecules. Could you please clarify if the preprocessing during training is the same as during inference?