BingSu12 / MoMu

20 stars 1 forks source link

Text-to-graph molecule generation

The PyTorch implementation of MoMu and Moflow-based zero-shot text-to-graph molecule generation, described in "Natural Language-informed Understanding of Molecule Graphs".

License & disclaimer

The codes can be used for research purposes only. This package is strictly for non-commercial academic use only.

Acknowledgments

We adapted the code of the PyTorch implementation of MoFlow which is publicly available at https://github.com/calvin-zcx/moflow. Please also check the license and usage there if you want to make use of this code.

Install

It takes about half an hour to install all the packages.

Prepare pre-trained models

Downloading the MoFlow model trained on the zinc250k dataset in

https://drive.google.com/drive/folders/1runxQnF3K_VzzJeWQZUH8VRazAGjZFNF 

Put the folder "zinc250k_512t2cnn_256gnn_512-64lin_10flow_19fold_convlu2_38af-1-1mask" in the folder ./MoleculeGeneration/results

Downloading the pre-trained graph and text encoders of MoMu

Put the pretrained files "littlegin=graphclinit_bert=scibert_epoch=299-step=18300.ckpt" for MoMu-S and "littlegin=graphclinit_bert=kvplm_epoch=299-step=18300.ckpt" in the folder ./MoleculeGeneration (Download from https://pan.baidu.com/s/1jvMP_ysQGTMd_2sTLUD45A password: 1234) Pretrained model when Bert is initized by the KV-PLM checkpoint:

checkpoints/littlegin=graphclinit_bert=kvplm_epoch=299-step=18300.ckpt

Pretrained model when Bert is initized by the SciBert checkpoint:

checkpoints/littlegin=graphclinit_bert=scibert_epoch=299-step=18300.ckpt

Downloading the per-trained Bert model

Download the folder "bert_pretrained" from https://huggingface.co/allenai/scibert_scivocab_uncased Put the folder "bert_pretrained" in the folder ./MoleculeGeneration

Testing & Useage

Generating molecules with the query texts used in the paper:

default: MoMu-S; To use MoMu-K, uncomment line 683 and comment line 682 in Graph_generate.py

cd MoleculeGeneration
python Graph_generate.py --model_dir results/zinc250k_512t2cnn_256gnn_512-64lin_10flow_19fold_convlu2_38af-1-1mask  -snapshot model_snapshot_epoch_200 --gpu 0  --data_name zinc250k --hyperparams-path moflow-params.json   --temperature 0.85  --batch-size 1 --n_experiments 5  --save_fig true --correct_validity true

Generating molecules with the query texts

Put the custom text descriptions in the list in line 816-825 of Graph_generate.py.

Results

The generated 60 (the number of generated molecules can be specified in lines 834-835 of Graphgenerate.py) molecule graphs with respect to the {id}-the text description are saved in the subfolder "generated/sci/text{id}/" of the folder "MoleculeGeneration". The corresponding SMILES and negative similarities between the text and the molecule graph are also output. For example, for the 0-th input text description, the output has the following forms:

0
['O[IH]CI(O)CC[IH]OI=CF', ... , 'CC(CCCOO)O[IH]OI=[IH](C)CC[IH]O']
[-2.299729347229004, ... , -2.235506772994995]

It takes about half an hour to generate 60 molecule graphs given an input text description.

Citation

Please cite the following paper if you use the codes:

@article{su2022natural,
  title={Natural Language-informed Understanding of Molecule Graphs},
  author={Bing Su, Dazhao Du, Zhao Yang, Yujie Zhou, Jiangmeng Li, Anyi Rao, Hao Sun, Zhiwu Lu, Ji-Rong Wen},
  year={2022}
}