TencentAI4S / tfold

open source code for Tencent tFold
Other
68 stars 9 forks source link

header


English | 简体中文

This package provides an implementation of the inference pipeline of tFold-Ab and tFold-Ag.

demo

We also provide:

  1. An pre-trained language named ESM-PPI, works to extract both the intra-chain and inter-chain information of the protein complex to generate features for the down-streaming task.
  2. The test set we construct in our paper.
  3. A human germline antibody frameworks library to guide antibody generation using tFold-Ag.

Any publication that discloses findings arising from using this source code or the model parameters should cite the tFold paper.

Please also refer to the Supplementary Information for a detailed description of the method.

If you have any questions, please contact the tFold team at fandiwu@tencent.com

Main models

Shorthand Dataset Description
ESM-PPI UniRef50, PDB, PPI, Antibody General-purpose protein language model, further pre-trained using ESM2 with 650M parameters. Can be used to predict multimer structure directly from individual sequences
tFold-Ab SAbDab (before 31 December 2021) SOTA antibody structure prediction model. MSA-free prediction with ESM-PPI
tFold-Ag SAbDab (before 31 December 2021) SOTA antibody-antigen complex structure prediction model. Can be used for virtual screening of binding antibodies and antibody design

Installation

  1. Clone the package

    git clone https://github.com/TencentAI4S/tFold.git
    cd tFold
  2. Prepare the environment

  1. Download pre-trained weights under params directory

  2. Download sequence databases for msa searching (only needed for tFold-Ag)

    colab_databases_path=your_path # Specify your path, requires more space
    sh scripts/setup_database.sh $colab_databases_path
    ln -s $colab_databases_path colab_databases

    Dataset

  3. Test set we construct in our paper

  4. Human germline antibody frameworks library to guide antibody generation

Quick Start

tFold-Ab

Example 1: predicting the structure of a antibody & nanobody using tFold-Ab

# antibody
python projects/tfold_ab/predict.py --pid_fpath=examples/prot_ids.ab.txt --fas_dpath=examples/fasta.files --pdb_dpath=examples/pdb.files.ab

# nanobody
python projects/tfold_ab/predict.py --pid_fpath=examples/prot_ids.nano.txt --fas_dpath=examples/fasta.files --pdb_dpath=examples/pdb.files.nano

tFold-Ag

Example 1: predicting the structure of a antibody-antigen complex & nanobody-antigen complex with pre-computed MSA

# antibody-antigen complex
python projects/tfold_ag/predict.py --pid_fpath=examples/prot_ids.abag.txt --fas_dpath=examples/fasta.files --msa_fpath=examples/msa.files/8df5_R.a3m --pdb_dpath=examples/pdb.files.abag

# nanobody-antigen complex
python projects/tfold_ag/predict.py --pid_fpath=examples/prot_ids.nanoag.txt --fas_dpath=examples/fasta.files --msa_fpath=examples/msa.files/7sai_A.a3m --pdb_dpath=examples/pdb.files.nano

antibody-antigen complex

python projects/tfold_ag/predict.py --pid_fpath=examples/prot_ids.abag.txt --fas_dpath=examples/fasta.files --msa_fpath=examples/msa.files/8df5_R.a3m --pdb_dpath=examples/pdb.files.abag

Example 2: Generate MSA for structure predictions using MMseqs2

python projects/tfold_ag/gen_msa.py --fasta_file=examples/fasta.files/PD-1.fasta --output_dir=examples/PD-1

Example 3: predicting the structure of a antibody-antigen complex & nanobody-antigen complex with inter-chain features

# generate inter-chain feature (ppi)
python projects/tfold_ag/gen_icf_feat.py --pid_fpath=examples/prot_ids.abag.txt --fas_dpath=examples/fasta.files --pdb_dpath=examples/pdb.files.native --icf_dpath=examples/icf.files.ppi --icf_type=ppi

# antibody-antigen complex prediction with inter-chain feature
python projects/tfold_ag/predict.py --pid_fpath=examples/prot_ids.abag.txt --fas_dpath=examples/fasta.files --msa_fpath=examples/msa.files/8df5_R.a3m --pdb_dpath=examples/pdb.files.abag --icf_dpath=examples/icf.files.ppi --model_ver=ppi

generate inter-chain feature (ppi)

python projects/tfold_ag/gen_icf_feat.py --pid_fpath=examples/prot_ids.abag.txt --fas_dpath=examples/fasta.files --pdb_dpath=examples/pdb.files.native --icf_dpath=examples/icf.files.ppi --icf_type=ppi

Example 4: CDRs loop deisgn with tFold-Ag with pre-computed MSA

python projects/tfold_ag/predict.py --pid_fpath=examples/prot_ids.design.txt --fas_dpath=examples/fasta.files --msa_fpath=examples/msa.files/7urf_A.a3m --pdb_dpath=examples/pdb.files.design

Citing tFold

If you use tFold in your research, please cite our paper

@article{wu2024fast,
  title={Fast and accurate modeling and design of antibody-antigen complex using tFold},
  author={Wu, Fandi and Zhao, Yu and Wu, Jiaxiang and Jiang, Biaobin and He, Bing and Huang, Longkai and Qin, Chenchen and Yang, Fan and Huang, Ningqiao and Xiao, Yang and others},
  journal={bioRxiv},
  pages={2024--02},
  year={2024},
  publisher={Cold Spring Harbor Laboratory}
}

and old version of tFold-Ab

@article{wu2022tfold,
  title={tFold-ab: fast and accurate antibody structure prediction without sequence homologs},
  author={Wu, Jiaxiang and Wu, Fandi and Jiang, Biaobin and Liu, Wei and Zhao, Peilin},
  journal={bioRxiv},
  pages={2022--11},
  year={2022},
  publisher={Cold Spring Harbor Laboratory}
}