Open AbhilashMathews opened 1 year ago
Running
inference.py
appears to work as expected on the provided example, i.e.
python -m inference --protein_ligand_csv data/protein_ligand_example_csv.csv --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
but when trying to run
evaluate_files.py
on this sample output, errors arise with regards to reading the molecules and finding directories for the complexes (which are all located indata/PDBBind_processed
after being downloaded from zenodo and unzipped). Would you happen to know why these errors are arising on these seemingly standard inputs and fixes to this issue? An excerpt from the error code is displayed below:(diffdock) [abhi@gpu-1-dy-g4ad4xlarge-1 DiffDock]$ python evaluate_files.py --results_path results/user_predictions_small --file_to_exclude rank1.sdf --num_predictions 40 Reading paths and names. 0%| | 0/363 [00:00<?, ?it/s]Can't kekulize mol. Unkekulized atoms: 7 8 9 10 11 RDKit was unable to read the molecule. Using the .sdf file failed. We found a .mol2 file instead and are trying to use that. Did not find a directory for 6qqw . We are skipping that complex Did not find a directory for 6d08 . We are skipping that complex Did not find a directory for 6jap . We are skipping that complex Did not find a directory for 6np2 . We are skipping that complex Did not find a directory for 6uvp . We are skipping that complex Did not find a directory for 6oxq . We are skipping that complex Did not find a directory for 6jsn . We are skipping that complex Did not find a directory for 6hzb . We are skipping that complex Can't kekulize mol. Unkekulized atoms: 7 8 9 10 11 RDKit was unable to read the molecule. Using the .sdf file failed. We found a .mol2 file instead and are trying to use that. Did not find a directory for 6qrc . We are skipping that complex Did not find a directory for 6oio . We are skipping that complex Did not find a directory for 6jag . We are skipping that complex Can't kekulize mol. Unkekulized atoms: 0 1 2 3 4 5 14 15 16 RDKit was unable to read the molecule. Using the .sdf file failed. We found a .mol2 file instead and are trying to use that. Did not find a directory for 6moa . We are skipping that complex Did not find a directory for 6hld . We are skipping that complex Did not find a directory for 6i9a . We are skipping that complex Did not find a directory for 6e4c . We are skipping that complex Did not find a directory for 6g24 . We are skipping that complex Did not find a directory for 6jb4 . We are skipping that complex Did not find a directory for 6s55 . We are skipping that complex 5%|██▏ | 18/363 [00:00<00:01, 175.38it/s]Did not find a directory for 6seo . We are skipping that complex Can't kekulize mol. Unkekulized atoms: 12 13 14 15 16 17 18 20 21 RDKit was unable to read the molecule. Using the .sdf file failed. We found a .mol2 file instead and are trying to use that. Did not find a directory for 6dyz . We are skipping that complex Did not find a directory for 5zk5 . We are skipping that complex Did not find a directory for 6jid . We are skipping that complex Did not find a directory for 5ze6 . We are skipping that complex ...
This may potentially be related to an earlier error en route to generating the language model embeddings:
(diffdock) [abhi@gpu-1-dy-g4ad4xlarge-7 diffdock]$ python datasets/pdbbind_lm_embedding_preparation.py 0%| | 10/19120 [00:00<22:45, 14.00it/s]encountered unknown AA: PTR in the complex 3kxz . Replacing it with a dash - . 0%| | 12/19120 [00:00<22:11, 14.35it/s]encountered unknown AA: TPO in the complex 1re8 . Replacing it with a dash - ...
I have encountered the same problem. Have you found a solution to this issue yet?
Not yet — I have not explored solutions for this issue further at this time
I also have the same errors. Hopefully, this issue could be solved soon.
Same issue
Maybe check the directory of input, if the name of folder is the sequence like '6q36' but not numbers or other things.
This sounds to me like there were issues when running inference. Then the results were not placed in “--out_dir results/user_predictions_small” Then they are not in the list when listdir lists that directory and that message is thrown.
Would you mind checking if the issue was during inference and the results from inference were never placed in "results/user_predictions_small". If that is the case, it would be useful to see the error during inference that causes the complex to be skipped.
This sounds to me like there were issues when running inference. Then the results were not placed in “--out_dir results/user_predictions_small” Then they are not in the list when listdir lists that directory and that message is thrown.
Would you mind checking if the issue was during inference and the results from inference were never placed in "results/user_predictions_small". If that is the case, it would be useful to see the error during inference that causes the complex to be skipped.
If I look in my results dir, I see that the inferred results are written to numbered folders ( there are directories called 0,1,... ) Instead, the evaluate_results script assumes that these folders have been named using a different scheme. See also the issue https://github.com/gcorso/DiffDock/issues/125
Running
inference.py
appears to work as expected on the provided example, i.e.python -m inference --protein_ligand_csv data/protein_ligand_example_csv.csv --out_dir results/user_predictions_small --inference_steps 20 --samples_per_complex 40 --batch_size 10 --actual_steps 18 --no_final_step_noise
but when trying to run
evaluate_files.py
on this sample output, errors arise with regards to reading the molecules and finding directories for the complexes (which are all located indata/PDBBind_processed
after being downloaded from zenodo and unzipped). Would you happen to know why these errors are arising on these seemingly standard inputs and fixes to this issue? An excerpt from the error code is displayed below:This may potentially be related to an earlier error en route to generating the language model embeddings: