coleygroup / molpal

active learning for accelerated high-throughput virtual screening
MIT License
159 stars 36 forks source link

[QUESTION]: #31

Closed spalgit closed 2 years ago

spalgit commented 2 years ago

Hello,

I am using molpal with docking and it works well. I am using a subset of eMolecules library (~1M compounds) and trying to do a VS against a target. I am using a --init-size 0.005 and --batch-size 0.005.

The only thing I am not able to control is the number of CPUs on which the vina docking works. I am using ray start --head and my VM has 24 cpus and gpus. MolPAl only uses 8 out ot the 24 CPUs.

This problem is not there when I use pyscreener directly and it utilizes all the cpus. Can you please suggest which parameter should I change to make the docking explorations run on all 24 cpus?

Thanks Sandeep

davidegraff commented 2 years ago

Hmm odd… the only thing I can imagine that would affect that is the ncpu parameter passed in the pyscreener config. What does your config look like?

spalgit commented 2 years ago

Hi David, Thanks. My ini and config file are. It is pretty much default. I am tying to give a ncpu=20 but it still runs on 8 cpus. Not sure why>

Thanks Sandeep


screen-type = vina
metadata-template = {"software": "vina"}
receptors = [proteinH.pdb]

center = [20.556, -14.266, 1.286]
size = [17, 10, 22]

ncpu = 20

------ And

config file is

[general]
output-dir = molpal_eMols_0.005_rf_ucb
--retrain-from-scratch yes

[pool]
library = libraries/eMols_similarity_unique.smi.gz
# no fps file will force MolPAL to write a new HDF5 file with the fingperprints

[encoder]
# the default encoder is Atom-pair of length 2048, min_path=1, max_path=3

[model]
# by default, we use an RF model

[acquisition]
# by default, we acquire inputs greedily

[objective]
# there are no default objective values
objective = docking
objective-config = examples/objective/docking_brpf1.ini
--minimize

[stopping]
# by default, MolPAl will explore until the fractional difference betewen the
# current top-k average and the moving average of the 3 recent top-k averages
# is less than 0.01. the default k value is equal to 0.05% of the pool size
davidegraff commented 2 years ago

Hmm yeah that looks correct. When you use vina by itself with 20 CPUs (by itself and inside pyscreener), do you observe full utilization?

spalgit commented 2 years ago

No I do not observe full utilization. It uses only 8 cpus. I cannot understand why?

I like the results though.

davidegraff commented 2 years ago

just to confirm, you observe underutilization when you run the following command:

vina -r RECEPTOR -l LIGAND --ncpu=20 [...]

davidegraff commented 2 years ago

closing this due to inactivity