Open DrrDom opened 4 months ago
Currently init_db
function takes ncpu
argument, which comes from the command line argument ncpu
. The issue here is that the command line arg ncpu
has different meaning if docking is launched on a single server or with dask on multiple servers. In dask-mode, this is the number of CPUs used for any other processing rather than docking. In docking on a single server this is additionally the number of molecules docked in parallel.
The obvious solution is to set ncpu
in all functions to Pool.cpu_count()
and a user will lose the control on those parts of a program and the control only on docking will remain. Not sure this is the best solution, but I do not see another option currently.
Another slow down is caused by not parallelized post-processing of molecules after protonation (in add_protonation
), if molecules were submitted as 3D structures. There is an additional and time-consuming step of assigning correct bond orders. This can be also addressed in the context of this issue. I have a draft implementation to solve this, but did not test it yet.
For the
--init
process, yes I notice that the compound is initialized very slowly a long time ago because some molecules take a long time to generate the isomers. That's why to speed up the process, I tend to multiply the ncpu needed with the cpu in theconfig.yml
for docking (I hardcoded it since I don't want to add up more argument to--init_db
at that time), which speeds up the process in linear fashion if I remembered (it takes around 3 hours to initialize ~600k compound including isomers with 150 CPU).Originally posted by @Feriolet in https://github.com/ci-lab-cz/easydock/issues/35#issuecomment-2122012253