coleygroup / molpal

active learning for accelerated high-throughput virtual screening
MIT License
159 stars 36 forks source link

standalone code? #2

Closed abazabaaa closed 2 years ago

abazabaaa commented 3 years ago

Hi,

I know that you specifically mention the package is currently set to run with the docking program you have specified -- but I have built a pipeline that runs with a different docking program that works for my current model.

I was wondering if you would be willing to suggest possible ways to use the existing code base to build a model with a dataset that I have in house (it isn't proprietary, so i can share).

In essence, I have about ~15M docked compounds in format: image

I have tried taking snippets of your code to see if I can just get basic sklearn RF regression model going, but I am struggling a bit. Essentially I would like to use your encoder/sklearn modules on parts of my dataset to see if I can reproduce some of the trends you observe in the paper.

The batch of 15M was picked at random from the enamine real collection (in 15 chunks) and docked using 15 iterations of our docking pipeline.

I am happy to share the docking pipeline codebase (it uses autodock GPU, and runs end-to-end from smiles to docking scores (scores are deposited in JSON objects). We can do about 15M ligands in a few days on 35gpu's and 1000 cpu cores with SGE (it is easy to adopt to slurm or another engine). My strength lies more in pipeline engineering than ML.. I am quite green with respect to the latter.

Feel free to close this if you feel it is outside the scope of issues..

my email is thomas.graham at pennmedicine.upenn.edu

If that is a better way to chat, let me know. Thanks for being the only one to release an excellent code base and superb documentation.

davidegraff commented 3 years ago

hey thomas,

thanks so much for opening this issue! AutoDock GPU has been one of our targets for integrated docking code, but I just haven't had a chance to learn the preparation/simulation/processing pipeline like I have with DOCK/Vina. I don't want to replicate work that you've already done, so it sounds like your code would be a great addition to our [pyscreener](https://github.com/coleygroup/pyscreener) repo. Please reach out to me at [removed] if you'd be interested in working on that!

thanks again for your interest and compliments!

best, david