chao1224 / MoleculeSTM

Multi-modal Molecule Structure-text Model for Text-based Editing and Retrieval, Nat Mach Intell 2023 (https://www.nature.com/articles/s42256-023-00759-6)
https://chao1224.github.io/MoleculeSTM
Other
188 stars 18 forks source link

Not getting expected results for zero-shot molecule editing #23

Open gihanpanapitiya opened 3 months ago

gihanpanapitiya commented 3 months ago

For zero-shot molecule editing, I am not getting the expected results as shown in the repository's notebook. Do you have a suggestion to improve this result.

This is what I get.

===== for text prompt: This molecule is soluble in water. =====
===== for SMILES OC1C2C1CC2 =====
WARNING: MOLECULE VALIDATION AND SANITIZATION CURRENTLY DISABLED
Use random noise for init
l2 lambda: 1.0
Use random noise for init
100%|██████████████████████████████████████████████████████████████████████████████████| 100/100 [00:06<00:00, 15.92it/s]
clip loss: -0.45563 L2 loss: 0.23571
WARNING: MOLECULE VALIDATION AND SANITIZATION CURRENTLY DISABLED
SMILES_list: ['OC1C2C1CC2', '[73Se][O+][O+][O+][O+][O+][O+][O+][73Se][73Se][73Se][73Se][73Se][F-][32P][73Se][F-][32P][73Se][O+][O+][73Se][F-][73Se][F-][73Se][73Se][73Se][73Se][73Se][73Se][F-][32P][73Se][F-]1[SiH3-][73Se]', '[73Se]S[3H][Rb][85SrH2][SiH3-]LogD_change_(0.9, 1.1][SiH3-][C@H][3H][Rb][85SrH2][SiH3-][I-][123I][73Se][123I][73Se][123I][73Se]<UNUSED_166>[F-]<UNUSED_81>Clint_low->high[F-]<UNUSED_81>[C@H][3H][Rb][85SrH2][SiH3-][85SrH2][SiH3-][85SrH2][SiH3-][85SrH2][SiH3-][I-][123I][73Se]S[3H][Rb][85SrH2][SiH3-][SiH][SiH2]<UNUSED_0>Clint_low->high[F-]<UNUSED_81>Clint_low->high[KH][11CH]LogD_change_(0.9, 1.1]Br<UNUSED_0>Clint_low->highLogD_change_(0.9, 1.1]Br[Se][123I][85SrH2][C@H][3H][Rb][85SrH2][C@H][3H][Rb][85SrH2][C@H][3H][Rb][85SrH2][C@H][C@H][C@H][C@H][3H][Rb][85SrH2][O+][O+][O+][O+][O+][O+][O+][O+][O+][O+][O+][O+][O+][123I][73Se]S[3H]Clint_low->high<UNUSED_64><UNUSED_81>[C@H]<UNUSED_61><UNUSED_81>Clint_low->high[KH]<UNUSED_46><UNUSED_61>[C@H][3H][123I][O+][123I][O+][123I][P@@+]LogD_change_(1.1, 1.3][PH][C@H][3H][Rb][85SrH2][C@H][3H][Rb][C@H][3H][Rb][C@H][3H][Rb][C@H][3H][123I][73Se]S[3H]Clint_low->high[KH]Clint_low->high[KH][123I]/[Al]<UNUSED_182>[18OH][3H][123I]S[3H][123I]S[3H][123I][P@@+][KH]LogD_change_(2.1, 2.3]LogD_change_(2.1, 2.3]LogD_change_(2.1, 2.3]LogD_change_(2.1, 2.3]<UNUSED_161>[123I][73Se][123I][73Se][123I][73Se]S[3H][32P][PH][C@H][3H]Clint_low->high[KH][S@@+]S[3H]<UNUSED_0>S[3H]<UNUSED_0>S[3H]<UNUSED_0>S[3H]Clint_low->high[KH][S@@+]S[3H]<UNUSED_0>S[3H][Rb][C@H][3H][Rb][C@H][3H][Rb][C@H][3H][Rb][C@H][3H][Rb][85SrH2][SiH3-]S[3H][SiH][PH][3H][32P][PH][3H][Rb][C@H][3H][32P][PH][3H][32P][PH][3H][32P][PH][S@@][32P][PH][3H]<UNUSED_0>LogD_change_(2.1, 2.3]<UNUSED_81>[C@H][3H][Rb][C@H][3H]Clint_low->high<UNUSED_64>[C@H][3H][Rb][C@H][3H][Rb][85SrH2][C@H][3H][Rb][85SrH2][C@H][3H][Rb][C@H][3H][Rb][C@H][3H][Rb][C@H][3H][Rb][C@H][3H][Rb][C@H][SiH][123I][P@@+]LogD_change_(0.3, 0.5][PH]S[3H][Rb][85SrH2][O+][O+][O+][123I][85SrH2][PH][3H][Rb][85SrH2][123I][85SrH2][123I][85SrH2][PH][3H][Rb][C@H][3H][Rb][85SrH2][PH][OH+][NH4+]<UNUSED_193>[PH]<UNUSED_193>[OH+][NH4+]<UNUSED_193>LogD_change_(0.9, 1.1][P@@+][As-][PH][3H][Rb][C@H][3H][Rb][C@H][3H]Clint_low->high[PH][3H][Rb][C@H][3H][Rb][C@H][3H][Rb][C@H][3H][Rb][85SrH2][PH][3H][Rb][C@H][3H][Rb][C@H][3H][Rb][C@H][3H][Rb][C@H][PH][AsH3][C@H][3H][Rb][C@H][3H][Rb][C@H][3H][Rb][C@H][SiH][PH][AsH3][C@H][3H][Rb][85SrH2]<UNUSED_64>[PH][85SrH2][PH][3H][Rb][85SrH2][PH][C@H][3H][Rb][C@H][3H][Rb][85SrH2][O+]LogD_change_(2.1, 2.3][F-]<UNUSED_81>[C@H][SiH][123I][As-][PH][C@H][SiH][PH][AsH3][C@H][SiH][PH][P@@+][As-][PH][C@H][3H][Rb][85SrH2][PH][3H][Rb][85SrH2][PH][3H][Rb][C@H][SiH][123I]<UNUSED_193><UNUSED_68><UNUSED_0>[As-][PH][S@@]<UNUSED_46>[PH]<UNUSED_193>LogD_change_(0.9, 1.1][TeH2][3H][NH4+]<UNUSED_193>[PH][3H][Rb][85SrH2][O+][123I][As-][PH][3H][Rb][85SrH2][PH][P@@+][As-][PH][P@@+][P@@+][P@@+][As-][PH][P@@+][As-][PH][3H][TeH2][3H][Rb][85SrH2][PH][3H][Rb][85SrH2][PH][3H][Rb][85SrH2][PH][P@@+][As-][PH][85SrH2][SiH3-][85SrH2][PH]<UNUSED_0>[TeH2][3H]<UNUSED_0>[TeH2][3H]<UNUSED_0>[TeH2][3H][Rb][85SrH2][PH]LogD_change_(3.5, 3.7][PH][85SrH2][PH][85SrH2][PH][85SrH2][PH][C@H][PH][C@H][PH][P@@+]LogD_change_(0.3, 0.5][PH][P@@+]LogD_change_(0.3, 0.5][PH][P@@+][PH][P@@+][PH][85SrH2][PH][85SrH2][PH]']
chao1224 commented 3 months ago

Hi @gihanpanapitiya,

I noticed that the losses are different, as in the notebook:

clip loss: -0.92124 L2 loss: 0.33059

This seems to be caused by the corrupted checkpoints. Can you help double-check them?

gihanpanapitiya commented 3 months ago

How can I double check?

These are are checkpoint related parameters I use,

########## for foundation ##########
parser.add_argument("--MoleculeSTM_model_dir", type=str, default="../data/demo/demo_checkpoints_SMILES")
parser.add_argument("--MoleculeSTM_molecule_type", type=str, default="SMILES", choices=["SMILES", "Graph"])
parser.add_argument("--vocab_path", type=str, default="../MoleculeSTM/bart_vocab.txt")

########## for generation ##########
parser.add_argument("--MegaMolBART_generation_model_dir", type=str, default="../data/pretrained_MegaMolBART/checkpoints")

########## for foundation and generation projection ##########
parser.add_argument("--language_edit_model_dir", type=str, default="../data/demo/demo_checkpoints_SMILES")