b-shields / edbo

Experimental Design via Bayesian Optimization
MIT License
122 stars 41 forks source link

Importing unindexed external results #11

Open iandoxsee opened 3 years ago

iandoxsee commented 3 years ago

(Posted here at Ben's request from private correspondence. Thanks, Ben!)

I have a question about importing external results: I've looked through all the example notebooks and bro.py but I'm still unable to figure out how to import a .csv file containing existing results, either one with the experimental index numbers or ideally one without them. Specifically, here's what I'm trying to do:

1) Use BO_express module to easily encode some components with Mordred from the SMILES strings (e.g., ligand, base, solvent, while other variables use numeric encoding) 2) Specify an external initialization (init_method='external') so that I can include pre-existing data from earlier screening (e.g., a ligand screen with all other variables held constant at levels which are included in the search space) 3) Populate a .csv file with data from (e.g.) external ligand screen in the same format as the "init" or "round0" files, but ideally not requiring the experiment index numbers since the design is created after the ligand screen was run. 4) Import the existing results .csv file into BO and use this to initialize the first round of screening.

NLente-link commented 3 years ago

Hey iandoxsee,

I created my self a little work around for your mentioned work flow. Maybe it's not the most elegant way to do so, but it works...

1) A pandas data frame is created containing the whole reaction space and indices 2) A .csv file named initial.csv is created in /results with already given column names to fill in your results 3) The reaction space and the initial.csv are compared and the original indices are added to your experiments automatically 4) Your results are added via bo.add_results from initial.csv with matching indices 5) bo is initialized with your experiments

I will attach my Jupiter notebook so that you can have a look and maybe use it for your optimization.

Reaction Optimization External.ipynb.zip

b-shields commented 3 years ago

Thanks for posting and sorry for taking a while to respond. For now, this function will allow users to import external results to a edbo.bro.BO object.

import pandas as pd
from edbo.objective import objective

# Define a function to load experiments
def add_unindexed_experiments(bo, results_path):
    """
    EDBO is currently designed to be used from end to end. This function will
    load experimental data which is not indexed by the optimizers search space
    so that we can use the data that has already been collected without having
    to look up the indices.
    """

    # Import points and load the reaction space
    results = pd.read_csv(results_path)
    domain_points = results.iloc[:,:-1]
    index = bo.reaction.base_data[bo.reaction.index_headers].copy()

    # Get corresponding points. Iterate to maintain order.
    union_index = []
    for i in range(len(domain_points)):
        ui = pd.merge(index.reset_index(), 
                      domain_points.iloc[[i]], 
                      how='inner')['index'][0]
        union_index.append(ui)

    index_out = index.iloc[union_index]

    # Make sure points are aligned
    assert False not in (index_out.values == domain_points.values).flatten()

    # Get encoded results
    encoded_results = bo.obj.domain.iloc[union_index].copy()
    encoded_results[results.columns.values[-1]] = results.iloc[:,-1].values 

    # Update the objective
    bo.obj = objective(domain=bo.obj.domain, results=encoded_results)

I'm not going to close the issue as a reminder to include a function in the next release.