MolecularAI / REINVENT4

AI molecular design tool for de novo design, scaffold hopping, R-group replacement, linker design and molecule optimization.
Apache License 2.0
359 stars 89 forks source link

Chemprop Issue with Custom Model in Reinvent_TLRL.ipynb #148

Closed kong0706 closed 3 weeks ago

kong0706 commented 1 month ago

I trained a chemoprop model (version 1.7.1) using my own data, and the model performed well. I embedded it into REINVENT and it ran smoothly on Reinvent_TLRL.ipynb. However, when I used the chemoprop model to predict the generated molecules, I found that the predicted values were significantly different from the ChemProp(raw) in stage_2.csv. I speculate that this may be a problem with the chemoprop model, because when I used the model.pt provided in Reinvent_TLRL.ipynb, the predicted values of the generated molecules were consistent with the ChemProp(raw) in stage_2.csv. I don't know how to solve this problem. Can you provide some constructive suggestions? Thank you.

kong0706 commented 1 month ago

The chemprop scoring component in the toml file is written like this. Since my training set scores are all between 0-1, I did not add a score conversion function. Additionally, I uploaded my model.pt file. model.zip

[[stage.scoring.component]] [stage.scoring.component.ChemProp] [[stage.scoring.component.ChemProp.endpoint]] name = "ChemProp" weight = 0.6 params.checkpoint_dir = "/tmp/R4_notebooks_output/chemprop" params.rdkit_2d_normalized = true params.target_column = "score"

halx commented 1 month ago

I can't really add much to this. I had a similar experience with the TNKS2 model you are referring to. The model has been trained with ChemProp 1.5.2. Predictions with version 1.6, however, produced significantly different results. I do not know what the origin of that is.

kong0706 commented 1 month ago

Thank you for your reply. I can only try using ChemProp 1.5.2 and observe the results.