Open berkeucar opened 1 week ago
Did this not work?: https://github.com/sokrypton/ColabFold/issues/563#issuecomment-1914101245
I use colabfold_batch --pdb-hit-file foobar_pdb100_230517.m8 --local-pdb-path /home/database/pdb_mmcif/mmcif_files foobar.a3m <outputdir>
for the prediction. /home/database/pdb_mmcif/mmcif_files
contains more than 220,000 flattened 4-letter mmCIF files.
So, basically, I appended all my peptide sequences together, using ":" as the separator between them. Let's say that file's name is tmp.fasta.
I obtained the files tmp.a3m
and tmp_pdb100_230517.m8
from colabfold_search command. Then I was running the following code:
colabfold_batch \ --amber \ --templates \ --num-recycle 3 \ --use-gpu-relax \ --pdb-hit-file tmp_pdb100_230517.m8 \ --local-pdb-path my_local_pdb/pdb_mmcif/mmcif_files \ --random-seed 0 \ --zip \ tmp_pdb100_230517.m8 \ output_folder
and I received the following error:
Could not generate input features tmp: string index out of range
= generate_input_feature(query_seqs_unique, query_seqs_cardinality, unpaired_msa, paired_msa,
File "localacolabfold_env/bin/lib/python3.10/site-packages/colabfold/batch.py", line 1035, in generate_input_feature
features_for_chain[protein.PDB_CHAIN_IDS[chain_cnt]] = feature_dict
IndexError: string index out of range
Please show me your commit hash number. For example, ColabFold on my machine has 1ccca5a53d20c909f3ccf8a4b81df804e6717cb1
. This is the commit on Jul. 23, 2024.
2024-11-11 00:18:05,900 Running colabfold 1.5.5 (1ccca5a53d20c909f3ccf8a4b81df804e6717cb1)
2024-11-11 00:18:06,190 Running on GPU
2024-11-11 00:18:06,859 Found 5 citations for tools or databases
...
...
...
If your commit hash number is old, updating LocalColabFold will fix this issue.
Just in case, I freshly installed localcolabfold with the script install_colabfold_batch_linux.sh
. Now, I cannot even obtain the msa files it gets stuck in MSA of the first peptide in the batch:
k-mer similarity threshold: 110
Starting prefiltering scores calculation (step 1 of 1)
Query db start 1 to 238
Target db start 1 to 209335862
[> ] 1.27% 4 eta 0s
I am running this on CPUs and my gcc version is 9.4.0.
Hello,
I have a fasta file containing thousands of peptide sequences. I wanted to predict their 3D structures using LocalColabFold 1.5.5 installed in an HPC cluster and I have access to GPU clusters as well. Now, I was successfully able to generate PDB & MSA files by following the post/issue: https://github.com/sokrypton/ColabFold/issues/563.
However, as I mentioned, I have multiple peptides in my fasta file and I would like to use my GPU access to produce 3D structure generations with colabfold_batch comment, using the PDB & MSA files I precomputed using the HPC cluster. This was asked in the attached issue but seems to fly under the radar.
Currenty, does LocalColabFold support massive prediction of peptides with the --pdb-hit-file flag?