Closed tamuanand closed 2 months ago
Does this mean that localcolabfold is still sending data to the colabfold MSA server?
Yes. If one specifies a FASTA file as input, localcolabfold will send the sequence to the MSA server as performed on ColabFold on Google Colaboratory. localcolabfold receives the corresponding MSA file (in .a3m format) to start structure inference using your local GPU.
If you want to run ColabFold entirely locally, you need extensive preparation. Please use setup_databases.sh script to download and build the databases (See also ColabFold Downloads). An instruction to run colabfold_search
to obtain the MSA and templates locally is written at https://github.com/sokrypton/ColabFold/issues/563.
Hi @YoshitakaMo
I have used setup_databases.sh
to download the different files. Do I need to do anything different to build them - I assume the databases are built already. I have the databases
directory at the same level as localcolabfold
directory. Do I need to pass any special flag to colabfold_batch
to tell it to use the different databases from my local databases
folder
If you want to run ColabFold entirely locally, you need extensive preparation. Please use setup_databases.sh script to download and build the databases (See also ColabFold Downloads). An instruction to run colabfold_search to obtain the MSA and templates locally is written at https://github.com/sokrypton/ColabFold/issues/563
I am also trying to replicate this but I end up getting an error
colabfold_search \
--use-env 1 --use-templates 1 \
--db-load-mode 2 \
<path_to>/localcolabfold/colabfold-conda/bin/mmseqs \
--db2 pdb100_230517 --threads 12 \
ras_raf.fasta <path_to>/databases manual_ras_raf
colabfold_search: error: unrecognized arguments: manual_ras_raf
Any ideas what I could be doing wrong?
Thanks
I guess you forgot to add --mmseqs
before <path_to>/localcolabfold/colabfold-conda/bin/mmseqs
.
colabfold_search \
--use-env 1 \
--use-templates 1 \
--db-load-mode 2 \
--mmseqs <path_to>/localcolabfold/colabfold-conda/bin/mmseqs \
--db2 pdb100_230517 \
--threads 12 \
ras_raf.fasta \
<path_to>/databases \
manual_ras_raf
Thanks @YoshitakaMo - yes, you are correct. I missed the --mmseqs
.
Running it now and will update.
Hi @YoshitakaMo - I was able to use colabfold_search
correctly.
I had a question on the path_to_pdb_mmcif
files for the next step - running colabfold_batch
using colabfold_search
I am using instructions from here
--local-pdb-path <path_to>/databases/pdb
or --local-pdb-path <path_to>/databases/pdb/divided
colabfold_batch --help
has this
--local-pdb-path LOCAL_PDB_PATH
Directory of a local mirror of the PDB mmCIF database (e.g.
/path/to/pdb/divided). If provided, PDB files from the directory are used
for templates specified by '--pdb-hit-file'. (default: None)
Thanks in advance.
should I use
--local-pdb-path <path_to>/databases/pdb
or--local-pdb-path <path_to>/databases/pdb/divided
In my case, I prepared pdb_mmcif/mmcif_files
containing xxxx.cif
files using download_pdb_mmcif.sh
, which is distributed at DeepMind's AlphaFold2 repository. The colabfold_batch
prediction was performed with --local-pdb-path <path_to>/pdb_mmcif/mmcif_files
and --pdb-hit-file foo_pdb100_230517.m8
.
--local-pdb-path <path_to>/pdb_mmcif/mmcif_files
can also automatically detect gzipped mmCIF files such as <path_to>/pdb_mmcif/mmcif_files/divided/xx/yxxz.cif.gz
.
Thanks @YoshitakaMo for answering all questions.
In my case, I prepared pdb_mmcif/mmcif_files containing xxxx.cif files using download_pdb_mmcif.sh, which is distributed at DeepMind's AlphaFold2 repository. The colabfold_batch prediction was performed with --local-pdb-path
/pdb_mmcif/mmcif_files and --pdb-hit-file foo_pdb100_230517.m8
Based on your note, I prepared pdb_mmcif/mmcif_files as above and I can run colabfold_batch
on the mmcif files from DeepMind's AlphaFold2 repository.
What's the difference to be expected between using mmcif files from DeepMind's AF2 repository versus using the mmcif files that comes with the divided
directory that comes after using setup_databases.sh
. I realize I might have different number of cif files (225158 + 4538 obsolete = 229696) from today's download from AF2 repository when compared to the number of cif.gz files (224572 in divided + 4535 in obsolete = 229107) using setup_databases.sh
Thanks once again.
I realize I might have different number of cif files (225158 + 4538 obsolete = 229696) from today's download from AF2 repository when compared to the number of cif.gz files (224572 in divided + 4535 in obsolete = 229107) using setup_databases.sh
The structural data of Protein Data Bank (PDB) is updated once a week. I suspect that the PDB data has been updated between the time when you previously used setup_database.sh
to build the database with cif.gz files and today. The current number of entries is shown on https://www.rcsb.org/.
In any case, the template information that AlphaFold2/ColabFold retrieves from PDB is minimal in most cases, so it will likely not significantly impact on the prediction results. You can obtain nearly the same results regardless of the PDB version.
Thanks @YoshitakaMo
Hi
I am following these steps to run localcolabfold - https://github.com/YoshitakaMo/localcolabfold?tab=readme-ov-file#for-linux
After successful installation and setup, when I run
colabfold_batch
, I get thisWARNING
Question: Does this mean that
localcolabfold
is still sending data to the colabfold MSA server?This link suggests that
localcolabfold
will run locally.Please advise.
Thanks in advance.