coleygroup / molpal

active learning for accelerated high-throughput virtual screening
MIT License
159 stars 36 forks source link

Fingerprints not generating #18

Closed cgseitz closed 2 years ago

cgseitz commented 2 years ago

Hello,

I am trying to generate fingerprints as you show in your documentation. Here is what I have done so far:

1) git clone https://github.com/coleygroup/molpal.git

2) conda env create -f environment.yml
do this in the directory inside molpal which contains the environment.yml

3) conda activate molpal

4) to test for getting rdkit into jupyter notebook,
conda create --name activelearning rdkit
conda activate activelearning
python -m ipykernel install --user --name=activelearning
jupyter notebook
and then create a notebook with the activelearning conda environment

5) start a ray cluster
redis_password=$( uuidgen 2> /dev/null )
export redis_password
ray start --head --redis-password=$redis_password --num-cpus 4 --num-gpus 1
export redis_password
export ip_head=localhost:6379

6) generate fingerprints
python fingerprints.py --library molpal/libraries/Enamine10k.csv.gz --fingerprint pair --length 2048 --radius 2 --name test_library

When I run this last command, it appears to hang up:

(molpal) cseitz@arizona:/net/gpfs-amarolab/cseitz/from_jam/projects/activelearning$ python fingerprints.py --library molpal/libraries/Enamine10k.csv.gz --fingerprint pair --length 2048 --radius 2 --name test_library
2022-01-04 19:46:31,252 INFO worker.py:826 -- Connecting to existing Ray cluster at address: 132.239.174.179:8899
Namespace(delimiter=',', fingerprint='pair', length=2048, library='molpal/libraries/Enamine10k.csv.gz', name='test_library', no_title_line=False, path='.', radius=2, smiles_col=0, title_line=True, total_size=None)

No results get generated after ~36 hours, and the command prompt does not return to the ready position. Looking at the currently running processes, it is not apparent that any of them are associated with fingerprint generation. Do you have any ideas on what I may be doing wrong? Thanks!

Best, Christian

davidegraff commented 2 years ago

I'm not sure what the purpose of step (4) is here. Also in step (5) you only need to use a specific redis password if you want to use multiple ray clusters on one node (e.g., multiple processes with independent ray clusters or multiple users utilitizing ray.) You can just type ray start --head --num-cpus X (because fingerprints.py does not use GPU.)

As for hanging, a few questions. is this persistent? What happens if you run it multiple times? Can you use ray on this computer? Can you see that the ray cluster is responsive? Is the fingerprint file HDF5 file created at all?

cgseitz commented 2 years ago

I worked with the IT people here today, in the end we realized that my download (from a month or two ago) contained scripts that were different from what you can download today; there must have been some incompatibilities that I couldn't otherwise detect. I deleted and redownloaded everything, and now this works per normal. Thanks for your help though.

davidegraff commented 2 years ago

Glad you were able to sort it out!