arneschneuing / DiffSBDD

A Euclidean diffusion model for structure-based drug design.
MIT License
339 stars 74 forks source link

Data preparation failed in Colab #12

Closed rwbfd closed 1 year ago

rwbfd commented 1 year ago

I have been trying to replicate the training results based on the original GitHub notebook in the repository. However, when it comes to preparing the data, it doesn't work. When I run the command python ./DiffSBDD/process_crossdock.py /content/crossdocked_pocket10 --no_H y, such errors occurred:

#failed: 100000: 100% 100000/100000 [00:10<00:00, 9586.86it/s]
Traceback (most recent call last):
  File "/content/./DiffSBDD/process_crossdock.py", line 353, in <module>
    lig_coords = np.concatenate(lig_coords, axis=0)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate

Similar but more cryptic issues arrive for the other datasets. When performed in the Colab cells, the error is

Traceback (most recent call last):
  File "/content/./DiffSBDD/process_bindingmoad.py", line 450, in <module>
    with open(f'data/moad_{split}.txt', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'data/moad_test.txt

Considering sometimes Colab cells might perform some funky behavior, I have decided to use the command line. Now the error is the same as before:

#failed: 130: 100%|█| 130/130 [00:00<00:00, 9719.25it/s
Traceback (most recent call last):
  File "/content/DiffSBDD/process_bindingmoad.py", line 571, in <module>
    lig_coords = np.concatenate(lig_coords, axis=0)
  File "<__array_function__ internals>", line 5, in concatenate
ValueError: need at least one array to concatenate

I have posted the Colab notebook here.

Any help will be greatly appreciated.

pearl-rabbit commented 1 year ago

Have you solved this problem? I have the same problem.

arneschneuing commented 1 year ago

Hi @rwbfd and @xiaoxiannv999, sorry for the slow response. With the given information, it is quite hard for me to see what goes wrong. However, I can say that the error: FileNotFoundError: [Errno 2] No such file or directory: 'data/moad_test.txt is caused by hard-coded paths to the training, validation and test lists. These lists are downloaded together with the code repository and can be found in the data/ subdirectory. Because the paths are hard-coded, process_bindingmoad.py should be run from the main directory (DiffSBDD/). Alternatively, you could change the paths in the script (here).

pearl-rabbit commented 1 year ago

Hello, @arneschneuing . I raised another question #18 ,can you answer it? I guess it may be caused by that reason.