MolecularAI / Chemformer

Apache License 2.0
211 stars 36 forks source link

Issue replicating results #28

Closed braydenrudisill closed 9 months ago

braydenrudisill commented 9 months ago

Hello, I've been looking to run predictions through your fine-tuned USPTO-50K model, and I'm getting 0% accuracy. Even though the generated molecules are similar, not a single one is exactly correct. Here are the test SMILES I gave the model as input. Here's an example of the output I'm getting:

python -m molbart.predict \
        --reactants_path {input_file} \
        --products_path {output_file} \
        --model_path {model_file} \
        --batch_size 64 \
        --n_beams 10
source       |  COC(=O)c1ccc2c(c1)n(CCC(C)C)c(=O)n2CCC(C)C
prediction   |  CC(C)CCBr.COC(=O)c1ccc2[nH]c(=O)n(CCC(C)C)c2c1
target       |  CC(C)CCn1c(=O)n(CCC(C)C)c2cc(C(=O)O)ccc21 

The sha1sum of the checkpoint i downloaded is c859c68b198ac1b8cfab48196bdde6b35641bf81. However, I had to adjust the model to fit the Chemformer 2.0 code. I ran the following code to convert the model before using it.

chemUSPTO50 = torch.load('Chemformer USPTO-50k.ckpt')
chemUSPTO50['hyper_parameters']['vocabulary_size'] = chemUSPTO50['hyper_parameters'].pop('vocab_size')
torch.save(chemUSPTO50, 'Chemformer2 USPTO-50k.ckpt')

If you know why this is happening, or if you are using a different model for the results below please let me know. Thank you.

Screenshot 2024-01-22 at 12 09 52 PM
braydenrudisill commented 9 months ago

Sorry, I realized was running forward reaction prediction but the model was fine tuned on retro-synthesis. I'm getting high T1 accuracy now that i've fed the products into the model.