gcorso / DiffDock

Implementation of DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking
https://arxiv.org/abs/2210.01776
MIT License
976 stars 238 forks source link

Problem running the inference using .csv file as input #197

Closed starwingc closed 3 months ago

starwingc commented 3 months ago

Hi, the example csv file can no longer used. I can't figure out how should I do this.

,complex_name,protein_path,protein_sequence,ligand_description
0,5R7Y,data/5R7Y.pdb,None,data/TC5.sdf
1,5R7Z,data/5R7Z.pdb,None,data/KD7.sdf
2,5R84,data/5R84.pdb,None,data/NA0.sdf
3,5REC,data/5REC.pdb,None,data/ME8.sdf

and the error is

Traceback (most recent call last):
  File "/home/tur54445/work/anaconda3_2023/envs/diffdock-gpu/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/tur54445/work/anaconda3_2023/envs/diffdock-gpu/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/gpfs/work/tur54445/git/DiffDock/inference.py", line 131, in <module>
    test_dataset = InferenceDataset(out_dir=args.out_dir, complex_names=complex_name_list, protein_files=protein_path_list,
  File "/gpfs/work/tur54445/git/DiffDock/utils/inference_utils.py", line 157, in __init__
    s = protein_sequences[i].split(':')

how should I format my csv? using the pdb file and sdf file for inference.

prathithbhargav commented 3 months ago

You'll probably have to add the sequence manually in order for it to work. As far as I understand, it needs the sequences in the csv file to compute the ESM Embeddings

starwingc commented 3 months ago

Thank you!