OpenBioML / chemnlp

ChemNLP project
MIT License
141 stars 45 forks source link

New Task: Buchwald-Hartwig yield prediction data #80

Open pschwllr opened 1 year ago

pschwllr commented 1 year ago

Data from https://www.science.org/doi/10.1126/science.aar5169 for reaction yield prediction task.

I will take care of that.

pschwllr commented 1 year ago
image

The reaction can be either represented through a single column reaction_SMILES or using four columns (ligand, ...).

@kjappelbaum how do you want me to include this information?

kjappelbaum commented 1 year ago

Hey! Thanks for looking into this!

I think it would be good to have the rxn SMILES as an identifier such that we can aggregate on this across all rxn datasets. However, for rxns, I would go via prompt templates that you can specify in the yml. That is, we would, by default, sample a prompt that uses rxn-SMILES and one of the targets (here yield) to construct a prompt. However, you might suggest more that leverage the structure of this dataset.

I can later (this weekend) commit an example of this to your PR if this helps!

Thanks, again 💯