YoshitakaMo / localcolabfold

ColabFold on your local PC
MIT License
610 stars 135 forks source link

LocalColabFold

ColabFold on your local PC (or macOS). See also ColabFold repository.

What is LocalColabFold?

LocalColabFold is an installer script designed to make ColabFold functionality available on users' local machines. It supports wide range of operating systems, such as Windows 10 or later (using Windows Subsystem for Linux 2), macOS, and Linux.

If you only intend to predict a small number of naturally occurring proteins, I recommend using ColabFold notebook or downloading structures from the AlphaFold Protein Structure Database or UniProt. LocalColabFold is suitable for more advanced applications, such as batch processing of structure predictions for natural complexes, non-natural proteins, or predictions with manually specified MSAs/templates.

Advantages of LocalColabFold

Note (May 21, 2024)

Note (Jan 30, 2024)

New Updates

Installation

For Linux

  1. Make sure curl, git, and wget commands are already installed on your PC. If not present, you need install them at first. For Ubuntu, type sudo apt -y install curl git wget.

  2. Make sure your Cuda compiler driver is 11.8 or later (the latest version 12.4 is preferable). If you don't have a GPU or don't plan to use a GPU, you can skip this step :

    $ nvcc --version
    nvcc: NVIDIA (R) Cuda compiler driver
    Copyright (c) 2005-2022 NVIDIA Corporation
    Built on Wed_Sep_21_10:33:58_PDT_2022
    Cuda compilation tools, release 11.8, V11.8.89
    Build cuda_11.8.r11.8/compiler.31833905_0
    
    DO NOT use nvidia-smi to check the version.
    See NVIDIA CUDA Installation Guide for Linux if you haven't installed it.

  3. Make sure your GNU compiler version is 9.0 or later because GLIBCXX_3.4.26 is required for openmm:

    $ gcc --version
    gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
    Copyright (C) 2019 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    If the version is 8.5.0 or older (e.g. CentOS 7, Rocky/Almalinux 8, etc.), install a new one and add PATH to it.

  4. Download install_colabbatch_linux.sh from this repository:

    $ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_linux.sh
    and run it in the directory where you want to install:
    $ bash install_colabbatch_linux.sh
    About 5 minutes later, localcolabfold directory will be created. Do not move this directory after the installation.

    Keep the network unblocked. And check the log output to see if there are any errors.

    If you find errors in the output log, the easiest way is to check the network and delete the localcolabfold directory, then re-run the installation script.

  5. Add environment variable PATH:

    # For bash or zsh
    # e.g. export PATH="/home/moriwaki/Desktop/localcolabfold/colabfold-conda/bin:\$PATH"
    export PATH="/path/to/your/localcolabfold/colabfold-conda/bin:\$PATH"
    It is recommended to add this export command to ~/.bashrc and restart bash (~/.bashrc will be executed every time bash is started)

  6. To run the prediction, type

    colabfold_batch input outputdir/
    The result files will be created in the outputdir. This command will execute the prediction without templates and relaxation (energy minimization). If you want to use templates and relaxation, add --templates and --amber flags, respectively. For example,

    colabfold_batch --templates --amber input outputdir/

    colabfold_batch will automatically detect whether the prediction is for monomeric or complex prediction. In most cases, users don't have to add --model-type alphafold2_multimer_v3 to turn on multimer prediction. alphafold2_multimer_v1, alphafold2_multimer_v2 are also available. Default is auto (use alphafold2_ptm for monomers and alphafold2_multimer_v3 for complexes.)

For more details, see Flags and colabfold_batch --help.

For WSL2 (in Windows)

Caution: If your installation fails due to symbolic link (symlink) creation issues, this is due to the Windows file system being case-insensitive (while the Linux file system is case-sensitive). To resolve this, run the following command on Windows Powershell:

fsutil file SetCaseSensitiveInfo path\to\localcolabfold\installation enable

Replace path\to\colabfold\installation with the path to the directory where you are installing LocalColabFold. Also, make sure that you are running the command on Windows Powershell (not WSL). For more details, see Adjust Case Sensitivty (Microsoft).

Before running the prediction:

export TF_FORCE_UNIFIED_MEMORY="1"
export XLA_PYTHON_CLIENT_MEM_FRACTION="4.0"
export XLA_PYTHON_CLIENT_ALLOCATOR="platform"
export TF_FORCE_GPU_ALLOW_GROWTH="true"

It is recommended to add these export commands to ~/.bashrc and restart bash (~/.bashrc will be executed every time bash is started)

For macOS

Caution: Due to the lack of Nvidia GPU/CUDA driver, the structure prediction on macOS are 5-10 times slower than on Linux+GPU. For the test sequence (58 a.a.), it may take 30 minutes. However, it may be useful to play with it before preparing Linux+GPU environment.

You can check whether your Mac is Intel or Apple Silicon by typing uname -m on Terminal.

$ uname -m
x86_64 # Intel
arm64  # Apple Silicon

Please use the correct installer for your Mac.

For Mac with Intel CPU

  1. Install Homebrew if not present:
    $ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  2. Install wget, gnu-sed, HH-suite and kalign using Homebrew:
    $ brew install wget gnu-sed
    \$ brew install brewsci/bio/hh-suite brewsci/bio/kalign
  3. Download install_colabbatch_intelmac.sh from this repository:
    $ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_intelmac.sh
    and run it in the directory where you want to install:
    $ bash install_colabbatch_intelmac.sh
    About 5 minutes later, colabfold_batch directory will be created. Do not move this directory after the installation.
  4. The rest procedure is the same as "For Linux".

For Mac with Apple Silicon (M1 chip)

Note: This installer is experimental because most of the dependent packages are not fully tested on Apple Silicon Mac.

  1. Install Homebrew if not present:
    $ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
  2. Install several commands using Homebrew (Now kalign 3.3.2 is available!):
    $ brew install wget cmake gnu-sed
    $ brew install brewsci/bio/hh-suite
    $ brew install brewsci/bio/kalign
  3. Install miniforge command using Homebrew:
    $ brew install --cask miniforge
  4. Download install_colabbatch_M1mac.sh from this repository:
    $ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/install_colabbatch_M1mac.sh
    and run it in the directory where you want to install:
    $ bash install_colabbatch_M1mac.sh
    About 5 minutes later, colabfold_batch directory will be created. Do not move this directory after the installation. You can ignore the installation errors that appear along the way.
  5. The rest procedure is the same as "For Linux".

Input Examples

ColabFold can accept multiple file formats or directory.

positional arguments:
  input                 Can be one of the following: Directory with fasta/a3m
                        files, a csv/tsv file, a fasta file or an a3m file
  results               Directory to write the results to

fasta format

It is recommended that the header line starting with > be short since the description will be the prefix of the output file. It is acceptable to insert line breaks in the amino acid sequence.

>sp|P61823
MALKSLVLLSLLVLVLLLVRVQPSLGKETAAAKFERQHMDSSTSAASSSNYCNQMMKSRN
LTKDRCKPVNTFVHESLADVQAVCSQKNVACKNGQTNCYQSYSTMSITDCRETGSSKYPN
CAYKTTQANKHIIVACEGNPYVPVHFDASV

For prediction of multimers, insert : between the protein sequences.

>1BJP_homohexamer
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR:
PIAQIHILEGRSDEQKETLIREVSEAISRSLDAPLTSVRVIITEMAKGHFGIGGELASKVRR
>3KUD_RasRaf_complex
MTEYKLVVVGAGGVGKSALTIQLIQNHFVDEYDPTIEDSYRKQVVIDGETCLLDILDTAGQEEYSAMRDQ
YMRTGEGFLCVFAINNTKSFEDIHQYREQIKRVKDSDDVPMVLVGNKCDLAARTVESRQAQDLARSYGIP
YIETSAKTRQGVEDAFYTLVREIRQH:
PSKTSNTIRVFLPNKQRTVVNVRNGMSLHDCLMKALKVRGLQPECCAVFRLLHEHKGKKARLDWNTDAAS
LIGEELQVDFL

Multiple > header lines with sequences in a FASTA format file yield multiple predictions at once in the specified output directory.

csv format

In a csv format, id and sequence should be separated by ,.

id,sequence
5AWL_1,YYDPETGTWY
3G5O_A_3G5O_B,MRILPISTIKGKLNEFVDAVSSTQDQITITKNGAPAAVLVGADEWESLQETLYWLAQPGIRESIAEADADIASGRTYGEDEIRAEFGVPRRPH:MPYTVRFTTTARRDLHKLPPRILAAVVEFAFGDLSREPLRVGKPLRRELAGTFSARRGTYRLLYRIDDEHTTVVILRVDHRADIYRR

a3m format

You can input your a3m format MSA file. For multimer predictions, the a3m file should be compatible with colabfold format.

Flags

These flags are useful for the predictions.

How to update

Since ColabFold is still a work in progress, your localcolabfold should be also updated frequently to use the latest features. An easy-to-use update script is provided for this purpose.

To update your localcolabfold, simply execute the following:

# set your OS. Select one of the following variables {linux,intelmac,M1mac}
$ OS=linux # if Linux
# navigate to the directory where you installed localcolabfold, e.g.
$ cd /home/moriwaki/Desktop/localcolabfold/
# get the latest updater
$ wget https://raw.githubusercontent.com/YoshitakaMo/localcolabfold/main/update_${OS}.sh -O update_${OS}.sh
$ chmod +x update_${OS}.sh
# execute it.
$ ./update_${OS}.sh .

FAQ

Tutorials & Presentations

Acknowledgments

How do I reference this work?