AI-HPC-Research-Team / GIT-Mol

A Multi-modal Large Language Model for Molecular Science with Graph, Image, and Text
MIT License
16 stars 1 forks source link

GIT-Mol

arXiv DOI

PWC PWC

PWC PWC PWC PWC

Here, we introduce GIT-Mol, a multi-modal large language model that integrates the structure Graph, Image, and Text information, including the Simplified Molecular Input Line Entry System (SMILES) and molecule captions. To facilitate the integration of multi-modal molecular data, we propose GIT-Former, a novel paradigm capable of mapping all modalities into a unified latent space.

The article has been accepted by Computers in Biology and Medicine.

GIT-Mol overview

An overview of GIT-Mol.

Note: The sections on Data, Model, and Training below describe the contents of the respective directories. Due to size constraints and permissions, some data and ckpts may not be uploaded.

Data

Pretrain_data

igdata - This folder contains the data for pretraining GIT-Former with image, graph, and SMILES modalities.

igcdata - This folder contains the data for pretraining GIT-Former with image, graph, and caption modalities.

image2d - Data of molecule images in the pretrain stage

Finetune_data

ChEBI-20 - This folder contains the data for finetuning GIT-Mol on molecule generation(caption->SMILES)

molcap - This folder contains the data for finetuning GIT-Mol on molecule caption(graph, SMILES->caption) and molecule image caption(image->SMILES)

MoleculeNet - This folder contains the data for finetuning GIT-Mol for molecule properties prediction (classification)

Due to file size constraints, the ChEBI-20 and MoleculeNet datasets can be downloaded from the following links:

Data processing

data_processing.ipynb

Model

GIT-MOL

Training

GIT-MOL

Below are the specific parameter explanations for the property_prediction task:

property_prediction -- finetune.py

References

[1]: Xu Z, Li J, Yang Z, et al. SwinOCSR: end-to-end optical chemical structure recognition using a Swin Transformer[J]. Journal of Cheminformatics, 2022, 14(1): 1-13.
[2]: Su B, Du D, Yang Z, et al. A molecular multimodal foundation model associating molecule graphs with natural language[J]. arXiv preprint arXiv:2209.05481, 2022.(https://arxiv.org/abs/2209.05481)
[3]: Edwards C, Lai T, Ros K, et al. Translation between molecules and natural language[J]. arXiv preprint arXiv:2204.11817, 2022.
[4]: Beltagy I, Lo K, Cohan A. SciBERT: A pretrained language model for scientific text[J]. arXiv preprint arXiv:1903.10676, 2019.
[5]: Li J, Li D, Savarese S, et al. Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language models[J]. arXiv preprint arXiv:2301.12597, 2023.

Citation

@article{liu2024git,
  title={Git-mol: A multi-modal large language model for molecular science with graph, image, and text},
  author={Liu, Pengfei and Ren, Yiming and Tao, Jun and Ren, Zhixiang},
  journal={Computers in Biology and Medicine},
  pages={108073},
  year={2024},
  publisher={Elsevier}
}