Closed lcollia closed 3 years ago
Hey Lionel,
thanks for the interest in our work here!
you raise a good point and one that we thought of while developing MolPAL given the software's similarity to existing Bayesian optimization libraries. However, we decided against it for two reasons:
Previous projects have done something similar to what you're suggesting. See ACS Cent. Sci. 2018, 4, 2, 268–276 for a good example.
I'm happy to talk more about this, and it's certainly a possible future project, but we have no plans to implement something like that into MolPAL
best, david
Hi David, Thank you for your detailed answer, I appreciate. Molecular Fingerprint is clearly not the best way to do optimization, as the decoding of a fingerprint to a molecule is not straightforward. But, one could image that with another descriptor space (like the ones coming from auto-encoder), it could be possible. Best, Lionel
To be clear, we're not optimizing in fingerprint-space. Rather, we're optimizing directly in structure-space by having a fully enumerated, discrete optimization domain via the virtual library/MoleculePool
. Treating the problem like this necessitates that we have to predict the objective function value of every single point in our domain, which we know to be "inefficient," but it also allows us to sidestep the research question of "how do you accurately represent molecules?" That's an ongoing challenge in the field and not something that we were interested in addressing in this work. In principle, if you could devise an accurate descriptor that is unique and fully invertible for every molecule, then you can perform molecular optimization using standard Bayesian optimization libraries. Works like the VAE that I mentioned above have their own challenges associated with them (notably, synthesizability as a big one,) so that was another reason why we stuck to this problem formulation.
Dear, Great work. Do you have a function to suggest the next molecule to test based on the surrogate model that is created? I mean not scoring an existing list of potential molecules (your "library"), but generating the fingerprint of the best next molecule to test according to the acquisition function?
As example, the "suggest_next_locations" function in a similar library for Bayesian optimization GpyOpt.
thanks, Lionel