Closed CalvinKlein96 closed 6 months ago
Hi Calvin,
First of all - just yesterday I pushed a commit that handles bugs when applying CombFold for more than 31 subunits - so make sure to pull it and recompile the C++ code (make clean && make).
Regarding your issue - you can use both upper and lower-case letters, as well as digits, this sets a limit of at least 62 different chains in the model.
I would recommend not using the same letter in upper and lower in the same subunit, as in some operating systems it can cause issues. We create files called
Let me know if that works for you.
Best Ben
Hi Ben,
so I pulled the new version of Combfold and compiled it, as well as changing my chain labels to some lowercase letters. I tried using the updated json file and new Combfold version on the previously predicted folds. However, I do get this error now:
File "/ibmm_data/kleinc/software/CombFold/scripts/run_on_pdbs.py", line 402, in <module>
run_on_pdbs_folder(os.path.abspath(sys.argv[1]), os.path.abspath(sys.argv[2]), os.path.abspath(sys.argv[3]))
File "/ibmm_data/kleinc/software/CombFold/scripts/run_on_pdbs.py", line 380, in run_on_pdbs_folder
assembled_files = create_complexes(clusters_path, first_result=0, last_result=max_results_number,
File "/ibmm_data/kleinc/software/CombFold/scripts/libs/prepare_complex.py", line 142, in create_complexes
create_transformation_pdb(assembly_path, transforms_strs[i], output_path=output_path, output_cif=output_cif)
File "/ibmm_data/kleinc/software/CombFold/scripts/libs/prepare_complex.py", line 108, in create_transformation_pdb
_merge_models(output_path, tmp_pdb_path, output_path, output_cif=output_cif)
File "/ibmm_data/kleinc/software/CombFold/scripts/libs/prepare_complex.py", line 22, in _merge_models
model_struct1 = read_model_path(model_path1)
File "/ibmm_data/kleinc/software/CombFold/scripts/libs/prepare_complex.py", line 17, in read_model_path
return Bio.PDB.PDBParser(QUIET=True).get_structure("s_pdb", pdb_path)
File "/ibmm_data/kleinc/software/Vader/localcolabfold/colabfold-conda/lib/python3.10/site-packages/Bio/PDB/PDBParser.py", line 100, in get_structure
self._parse(lines)
File "/ibmm_data/kleinc/software/Vader/localcolabfold/colabfold-conda/lib/python3.10/site-packages/Bio/PDB/PDBParser.py", line 123, in _parse
self.trailer = self._parse_coordinates(coords_trailer)
File "/ibmm_data/kleinc/software/Vader/localcolabfold/colabfold-conda/lib/python3.10/site-packages/Bio/PDB/PDBParser.py", line 198, in _parse_coordinates
resseq = int(line[22:26].split()[0]) # sequence identifier
ValueError: invalid literal for int() with base 10: 'E'
It fails after having created the output_clustered_0.pdb already. Does this stem from the previous naming issue?
It is unclear to me if this is related to the naming issue... Could you upload the "subunits.json" file and possibly also a zip of
Sure, no problem, here you go:
In the output_clustered_0.pdb 4 chains are missing.
Thanks! It seems the issue is that your structure is very big, and the PDB format only supports up to 100,000 atoms. To handle this, the pipeline can output in CIF format, it is possible in the Colab Notebook however, it is still not accessible by a flag locally. So you currently have 2 options to make it work locally:
rerun the pipeline, but change in scripts/run_on_pdb.py:326 so that output_cif=True.
Without rerunning the entire pipeline (as the assembly itself worked fine, and only the generation of result files was faulty).
change in scripts/libs/prepare_complex.py:125 to output_cif=True
and then run:
scripts/libs/prepare_complex.py <output_folder>/_unified_representation/assembly_output/output_clustered.res 1 10
This will create the results in the folder
That worked wonderfully for me, thanks!
Hi Ben,
I've been using Combfold for predictions up to 21 chains and it worked well. Now I tested it for a prediction with 33 chains by naming the chains A1 to A11, B1 to B11, etc. The run_on_pdbs.py then gives an assertion error for badly named chains. I also tested renaming the chains to AA, AB, etc., but I got the same error.
From what I understand the problem comes from PDBIO which wants one chain ID letter to be specified. Is there a workaround to increase the number of chains that are assembled?
Cheers Calvin