b-shields / edbo

Experimental Design via Bayesian Optimization
MIT License
122 stars 41 forks source link

Passing empty strings as components #14

Closed fnthibaud closed 3 years ago

fnthibaud commented 3 years ago

Hello,

I have been using edbo to try and optimize some sol-gel reactions in the lab, with some promising first results. I have encountered one issue, however, which is that it doesn't seem possible to pass empty strings as reaction components. The reason why I'm interested in doing this, is because I would like to try to compare the same reaction with and without ligands for stabilization. Thus I would like to pass a list of the form " 'ligands' : [' ', 'ligand_1', 'ligand_2, ... , 'ligand_n'] ". Would there be any way to achieve a similar result, or would it be better to pass the surface functional group (i.e. OH- in the case of hydroxide nanoparticles) as a ligand in the case of not adding a ligand?

I hope this question makes sense; if anything was unclear, please feel free to ask for further clarification.

Best, Fabien

b-shields commented 3 years ago

Glad to hear you have been finding EDBO useful. One way to handle your problem would be to one-hot-encode the ligands and include no ligand as an option. This is what that would look like if you used BO_express:

from edbo.bro import BO_express

# (1) Define a dictionary of components
reaction_components={
    'ligand': ['None','ligand1', 'ligand2', 'ligand3'],
    'concentration':[0.1, 0.2, 0.3]}

# (2) Define a dictionary of desired encodings
encoding={'ligand':'ohe',
          'concentration':'numeric'}

# (3) Instatiate BO_express
bo = BO_express(reaction_components=reaction_components,
                encoding=encoding,
                batch_size=10,
                acquisition_function='EI',
                target='yield')
fnthibaud commented 3 years ago

I see, thanks for the swift reply!

Just a quick follow-up: would it be at all possible to use 'resolve' to do the same thing and setting all chemical descriptors for the 'None' ligand to 0? It seems that when I try using 'resolve' while including 'None' (or, of course, any chemical not included in the NIH database) the edbo bot is spawned and when choosing to one-hot-encode the unknown component, all other components are also one-hot-encoded.

b-shields commented 3 years ago

No problem! Indeed, the resolve method will only work for valid chemical names. In this case I think you may be better off with one-hot-encoding. However, If you would like to utilize a custom encoding I would suggest checking out the less automated but more flexible class edbo.bro.BO (https://b-shields.github.io/edbo/bro.html). BO will optimize over any numerical space you pass as the domain. You just have to specify it as a pandas DataFrame (e.g., generate one using a loop or load a CSV with pandas.read_csv).

fnthibaud commented 3 years ago

Great, i'll check it out! Thanks for the help!