aspuru-guzik-group / Tartarus

A Benchmarking Platform for Realistic And Practical Inverse Molecular Design
https://arxiv.org/abs/2209.12487
69 stars 7 forks source link

Dataset for Surrogate Models #6

Closed ftherrien closed 1 year ago

ftherrien commented 1 year ago

Hi,

Which dataset did you use to train your surrogate models for the "Design of Organic Photovoltaics" tasks? In your SI, section B.1.b there is a link to "http://github.com/HIPS/neural-fingerprint" but, there, I can only find the CEP database power conversion efficiency data.

Is it hce.csv? If so, does this hce.csv file contain what you call CEP_SUB in the paper? Or did you use the full CEPDB (with 2.3 million molecules) to train the surrogate models?

Thanks for these benchmarks, they are very useful!

akshat998 commented 1 year ago

Hi @ftherrien

The dataset is located in gdb13.csv; if you can wait for a few days, we are also about to update the code, making the calculations & molecules more feasible.

This should be done over the next few days. Additionally, thank you for pointing out the error -- we will have a look & update the manuscript.

Regards Akshat

ftherrien commented 1 year ago

gdb13.csv seems to be for organic emitters are you sure it is not hce.csv?

akshat998 commented 1 year ago

Oops I misread your question @ftherrien :)

You are correct. The dataset is in hce.csv (gdb13 is for a different task). The link http://github.com/HIPS/neural-fingerprint was used to get all the smile strings . hce.csv only contains CEP_SUB (the model was trained only on this & not on 2.3million molecules).

Regards Akshat

ftherrien commented 1 year ago

That makes sense thanks!

In the exploratory task still for the Design of Organic Photovoltaics (Table II) you used 1000 randomly generated molecules. 1. Is that dataset available? 2. Did you train the surrogate models on them?

akshat998 commented 1 year ago
  1. Yes: datasets/hce_unbiased.csv
  2. No -- we use the same trained model (as the biased task) :)
ftherrien commented 1 year ago

Great, thanks for your quick replies!