coleygroup / molpal

active learning for accelerated high-throughput virtual screening
MIT License
159 stars 36 forks source link

unable to reproduce results #14

Closed albertma-evotec closed 2 years ago

albertma-evotec commented 2 years ago

Hi, I am trying to reproduce your Enamine 50k results yet unsuccessful. I was using the provided Enamine50k_online.ini config file in examples/config folder. No settings were changed except the output-dir name, I chose a different name. image

These are the three things generated in the directory after the run is completed. image

in the data folder, there are: image

I tried to analyze these explored compounds to calculate how well they recover the top-500 scored compounds. It only achieve the 'random' performance. Is there any thing I missed? image

Thanks

davidegraff commented 2 years ago

Can you share what the output config file contains and how you’re performing the analysis?

albertma-evotec commented 2 years ago

This is the generated config.ini I mentioned above.

image

These are what inside the chkpts folder:

image image

None of the generated files and folders have the structures of what mentioned in the README, so I cannot really follow 'Analyzing data' section.

image

So I simply retrieve the top 500 scored compounds from the below file which is the same file used for objective look up and then check against the generated top_????_explorediter?.csv files to see how many top 500 compounds are recovered by the explored sets.

image

davidegraff commented 2 years ago

You didn’t specify your acquisition metric, so it used the default random metric. That’s why you’re observing the same results as random acquisition

albertma-evotec commented 2 years ago

Ok thanks, I thought it was greedy by default when I look at the minimal_config.ini image

davidegraff commented 2 years ago

Oh fair point. I’ll fix the documentation for this