b-shields / edbo

Experimental Design via Bayesian Optimization
MIT License
122 stars 41 forks source link

Where is the dft descriptor extractor? #17

Closed saankim closed 1 year ago

saankim commented 2 years ago

Hi. Thank you for such a great codes and paper. I read it and I'm trying to use edbo on my project.

But I cannot find the way how to extract descriptor csv without autoqchem. It's useful to upload molecule's gaussian result log file to autoqchem indeed but my molecule is a kind of classified information so I couldn't get the permission to upload it on the public site.

I hope anyone let me know the way how to get descriptors from gaussian log file without autoqchem.

Thank you.

b-shields commented 2 years ago

Hi, happy to hear you are interested in using edbo.

One way to go would be to write a your own script using auto-qchem methods. For example you can use autoqchem.gaussian_log_extractor.gaussian_log_extractor to get descriptors from your gaussian log files. Alternatively, if you have access to Mathematica you could modify one of my example notebooks for your purposes: https://github.com/b-shields/auto-QChem/tree/master/examples/descriptor_matrix_generation/deoxyfluorination.

In our experiments we found that one-hot-encoding and Mordred descriptors also give good optimization results. I would suggest that you consider using edbo.bro.BO_express to automatically encode your search space with Mordred descriptors. You can find an example here: https://b-shields.github.io/posts/2020/09/EDBO%20pre-release%20III/.

saankim commented 2 years ago

Thank you for answer.

What does c_min and c_max descriptors mean? I may expect c as conformer but I'm not sure.

b-shields commented 2 years ago

No problem.

The c_min and c_max descriptors refer to the atoms with the lowest and highest charge. The idea was to automatically encode information about possible reactive centers.