facebook / Ax

Adaptive Experimentation Platform
https://ax.dev
MIT License
2.35k stars 303 forks source link

optimization of chemistry reactions #706

Closed beef-broccoli closed 2 years ago

beef-broccoli commented 2 years ago

Hi there,

I noticed there are some chemistry reaction data in metrics. Just wondering, is there any active development in applying ax to optimization in chemistry spaces (usually a multi-dimensional discrete space defined by molecules)? As far as I can see, ax just one hot encodes all choice parameters, but in the context of molecules it's possible to featurize them with physical and chemical descriptors, which I don't think Ax currently supports (please correct me if i'm wrong).

Thanks

lena-kashtelyan commented 2 years ago

Hi @beef-broccoli, thank you for your interest in Ax! Let's loop in @Balandat for this discussion; he will know more on this. He will likely get back to us next week!

Balandat commented 2 years ago

Hi @beef-broccoli, at least on our end there is currently no active development of methods targeted specifically to chemistry spaces.

We have some work in the pipeline to improve optimization of discrete and mixed spaces (both with ordered discrete and unordered categorical parameters), but no work dedicated (at least on our end) to featurize parameters with physical and chemical descriptors.

Of course, if it's possible to do the featurization outside of Ax then we should be able to ingest that new search space and just work with that.

beef-broccoli commented 2 years ago

Thanks for getting back to me. I should say that even with just one hot encoding, it still works very well in a large search space.

Does ax currently support custom features for categorical parameters? I can't seem to find any information on this. Could you offer some pointers on how I can use external featurizations to make a new search space?

Balandat commented 2 years ago

We don’t have any specific support for custom features at this point. What I mean is the following: Say you have a featurization or some kind of continuous embedding of the categoricals. What you’d do is to put in the embedding dimensions as range parameters in the new search space.

beef-broccoli commented 2 years ago

In these cases I can't really use a continuous dimension like range parameter, because ultimately when I evaluate I need to convert encodings back to a valid molecule. If I do that it's very likely ax will suggest a "candidate" from the continuous space that doesn't correspond to any molecule at all, which i cannot evaluate. But I do understand what you are saying. Thanks for the help, I really appreciate it

lena-kashtelyan commented 2 years ago

@beef-broccoli, please feel free to reopen this issue if you have any follow-up questions or comments!

sgbaird commented 2 years ago

@beef-broccoli, while this is specific to chemical formulas, you may be interested in CrabNet and/or mat_discover (disclaimer: first is produced by my research group, latter I'm the main author of). You could also use something more geared towards the backend of molecules such as MEGNet as a replacement for CrabNet (i.e. the objective function).

I can't say these suggestions will overcome your issue of converting encodings back to a valid molecule, but it seemed worth mentioning. I've given a lot of thought to the issue of "converting back" (in my case back to chemical formulas) and would be happy to chat about it outside of the Ax page (maybe a mat_discover Issue or via email) if that's of interest. I have some ideas that might be applicable to your use case as well and would be curious to hear your thoughts as well.

Stefan2016 commented 2 years ago

Hi there,

I wanted to open this question again, as I am also thinking about an application where I can represent chemicals as either categorical variables (e.g. by names) or using a set of different variables (continuous and discrete, e.g. (molecular weight, polar yes/no, density)). The last approach would probably offer more data but I am not sure how to implement this in Ax.

choice_param = ChoiceParameter(name="y", parameter_type=ParameterType.STRING, values=["chemical A", "chemical B"])

Should somehow become similar to

choice_param = ChoiceParameter(name="y", parameter_type=ParameterType.???, values=["(99.5,1,850.0", "(87.2,0,990.0)"])

And the parameters should probably not be handled as strings anymore (e.g. no more one hot encoding).

Is there a way to do this in Ax? If it makes sense?

Thanks and cheers, Stefan

sgbaird commented 2 years ago

@Stefan2016. Is there a finite list of candidate chemicals that you have https://github.com/facebook/Ax/issues/771, or are you looking for something that will look through the molecular weight, polar yes/no, and density in a more continuous sense (which then involves the non-trivial problem of inverse design)?

Or are you talking about Bayesian optimization of a formulation (e.g. 20% chemical A, 40% chemical B, 40% chemical C)

I suggest reading through https://github.com/facebook/Ax/issues/727 and the linked posts. Based on what you're trying to do, I can offer some advice, including how to bake in domain knowledge (molecular weight, etc.) to the model.

Also, the Ax team (I'm a user, non-affiliated) is less likely to see comments on closed posts.

Stefan2016 commented 2 years ago

@sgbaird

@Stefan2016. Is there a finite list of candidate chemicals that you have #771, or are you looking for something that will look through the molecular weight, polar yes/no, and density in a more continuous sense (which then involves the non-trivial problem of inverse design)?

In my case there would be a finite (known) list of chemicals.

I suggest reading through #727 and the linked posts. Based on what you're trying to do, I can offer some advice, including how to bake in domain knowledge (molecular weight, etc.) to the model.

That would be great, if you can give me some details on how to include such information using the parameter definitions in Ax.

Also thanks for the reference, I will definitely have a look.

Also, the Ax team (I'm a user, non-affiliated) is less likely to see comments on closed posts.

Thanks for the hint. I was not sure if this issue will get opened again. Is it best practice to a open a new issue then?

Thanks again and regards, Stefan

sgbaird commented 2 years ago

@Stefan2016 See this suggestion https://github.com/facebook/Ax/issues/771#issuecomment-1006829441 for how to use a predefined list. I haven't tested this yet.

Probably better to open a new issue. I think the question about including domain knowledge is sufficiently different from this thread to warrant its own issue. My simple suggestion is that you set up your search space using molecular_weight, polarQ, density, etc. with appropriate bounds using the Service API, attach your training data using the featurized versions of the chemicals, calculate the expected improvement for your list of candidates https://github.com/facebook/Ax/issues/771#issuecomment-1006829441 (again using the featurized versions of the candidate chemicals), and take the compound with the maximum expected improvement as your next suggested experiment.

In the new issue, I think it would help for you to give more context to your problem. Even better if you can come up with a minimal working example (MWE) or toy problem that has the problem statement, some runnable code, and what you've tried so far, similar to https://github.com/facebook/Ax/issues/727#issue-1058207110.

You're welcome! 🙂

sgbaird commented 2 years ago

@beef-broccoli and @Stefan2016, another approach that I'm leaning towards is to optimize over the original parameters while doing contextual learning on the "context" of the domain knowledge parameters. See https://github.com/facebook/Ax/issues/905. Planning to do a MWE for this.