MIDIfactory / AlphaFastPPi

Fast AlphaFold-Multimer based pipeline for Protein-Protein Interaction (PPI) screening
GNU General Public License v3.0
30 stars 4 forks source link
bioinformatics protein-protein-interaction protein-structure

AlphaFastPPi

AlphaFastPPi is a Python package designed to streamline large-scale protein-protein interaction analysis using AlphaFold-Multimer. For each protein combination tested, AlphaFastPPi will return a single model.

Now the same result can be achieved using AlphaPulldown

To obtain a single model in the pulldown version of AlphaPulldown v.1.0.4 use options --num_predictions_per_model=1, --model_names=model_1_multimer_v3, --num_cycle=1, --nopair_ms when running run_multimer_job.py.

Note that slightly different results can be achieved if a different model_name is used.

Requirments

Installation

conda create -n AlphaFastPPi -c omnia -c bioconda -c conda-forge python==3.10 openmm==8.0 pdbfixer==1.9 kalign2 cctbx-base pytest importlib_metadata hhsuite

Usage

AlphaFastPPi supports two different modes:

  1. Create the MSAs using AlphaPulldown, compute and store the necessary features for each protein:
    conda activate AlphaFastPPi
    create_individual_features.py \
    --fasta_paths=<fasta file containg all the bait(s) and candidates sequences> \
    --data_dir=<path to alphafold databases> \
    --output_dir=<dir to save the output objects> \ 
    --max_template_date=<any date you want, format like: 2050-01-01> \
    --use_mmseqs2=True

    --fasta_paths: you can use a single fasta file containing all the sequences to include in the analysis or several fasta files separated by comma (e.g. --fasta_paths=protein_A.fasta, protein_B.fasta). \ N.B= the FASTA file should not contain any special characters (such as |, :, ;, #) or spaces. To prevent errors, replace these characters with underscore

--use_mmseqs2: when set to "True," mmseqs is executed remotely, which is a quick option and typically takes a few minutes per protein. Alternatively, you can set it to "False" to use HHblits locally, or you can run MMseqs locally and then indicate the folder containing the output using the --use_precomputed_msas option \ \ This will create an --output_dir formatted like this:

output_dir
    |-protein_A.a3m
    |-protein_A_env/
    |-protein_A.pkl
    |-protein_B.a3m
    |-protein_B_env/
    |-protein_B.pkl
    ...
  1. Predict the models:
    
    python3 AlphaFastPPi.py 
    --mode <pulldown|all_vs_all>  
    -l proteins.txt \
    -b baits.txt [only for pulldown mode] \
    -d <path to alphafold databases> 
    -m <path to monomer objects dir> \
    -o <name of the output directory>
`--mode`: can be **pulldown** or **all_vs_all**
\
\
`-l`: the file should contain a list of the sequences to use (one per line). The names should  match the names of the sequences in the original FASTA file (and in  --monomer_objects_dir). In **pulldown** mode this file should contain **only** the list of the sequences to use as **'candidates'**, while the 'baits' should be listed in another file specified with `-b`. Both files should be formatted as follows:

protein_A protein_B ...

\
`-m`: Path to the output_dir created by *create_individual_features.py* 

## Output
For each protein-protein combination, the output will include a subfolder named proteinA_and_proteinB which contains the following files:
- the model in .pdb format
- the corresponding .pkl file
- timings.json

Additionally, a table named output_name.tsv will be generated, containing the following metrics:
- [pDockQ](https://doi.org/10.1038/s41467-022-28865-w)
- ipTM
- ipTM+pTM
- Average plDDT

## Example - Pull down experiment
Input example files are provided in the "example" folder of this repository

1. Compute the MSA using mmseq2

conda activate AlphaFastPPi create_individual_features.py \ --fasta_paths=bait.fasta, candidates.fasta \ --data_dir=/mnt/datadisk/AlphaFoldDBs \ --output_dir=1_fastmsa \ --max_template_date=2050-01-01 \ --use_mmseqs2=True

2. Structural predictions

python3 AlphaFastPPi.py --mode pulldown
-l candidates.txt \ -b bait.txt \ -d /mnt/datadisk/AlphaFoldDBs -m 1_fastmsa \ -o 2_predictions



## Citation

If our tool is useful to you, please cite:
- Bellinzona G, Sassera D, Bonvin AMJJ. Accelerating Protein-Protein Interaction screens with reduced AlphaFold-Multimer sampling. bioRxiv 2024.06.07.597882; doi: https://doi.org/10.1101/2024.06.07.597882
- Yu D, Chojnowski G, Rosenthal M, Kosinski J. AlphaPulldown-a python package for protein-protein interaction screens using AlphaFold-Multimer. Bioinformatics. 2023;39(1):btac749. doi:10.1093/bioinformatics/btac749