acharkq / ProtT3

Source code for ACL 2024 paper: "ProtT3: Protein-to-Text Generation for Text-based Protein Understanding"
48 stars 2 forks source link

ProtT3: Protein-to-Text Generation for Text-based Protein Understanding

Codes of our ACL2024 paper.

Authors: Zhiyuan Liu, An Zhang, Hao Fei, Enzhi Zhang, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

Dependencies

python==3.8

Dataset

Download our pre-processed datasets from link, and unzip the datasets under the ./data directory

Reproduce results by training from scratch

python stage1.py --devices '0,1,2,3' --mode train --filename stage1_ckpt --num_query_token 8 --plm_name "facebook/esm2_t30_150M_UR50D" --save_every_n_epochs 10 --batch_size 32 --precision 'bf16-mixed' --num_workers 8
python convert.py --input /path/to/stage1/ckpt/address --output /path/to/ckpt/saving/address

Reproduce results by loading our checkpoints

Download our released checkpoints from link

python stage1.py --devices '0,1,2,3' --mode eval --filename stage1_ckpt --num_query_token 8 --plm_name "facebook/esm2_t30_150M_UR50D" --save_every_n_epochs 10 --batch_size 32 --precision 'bf16-mixed' --num_workers 8 --init_checkpoint /path/to/stage1.ckpt;

Citation

@inproceedings{liu2024prott,
    title={ProtT3: Protein-to-Text Generation for Text-based Protein Understanding},
    author={Liu, Zhiyuan and Zhang, An and Fei, Hao and Zhang, Enzhi and Wang, Xiang and Kawaguchi, Kenji and Chua, Tat-Seng},
    booktitle={{ACL}},
    publisher    = {Association for Computational Linguistics},
    year={2024},
    url={https://openreview.net/forum?id=ZmIjOPil2b}
}