dina-lab3D / CombFold

Apache License 2.0
68 stars 12 forks source link

Atom serial number ('100000') exceeds PDB format limit. #11

Open xvazquezc opened 2 months ago

xvazquezc commented 2 months ago

Hi there,

I'm trying to assemble a homopentamer with subunits of ~1700 aa (above 51000 atoms) which causes a problem due to the atom count limit of the PDB file specs of 99,999 atoms. I'm guessing that at some point subunits in PDB chains in the pair files are joined resulting in this error.

I ran prepare_fastas.py --stage pairs with a limit of 3,900 aa as it's close to the limit I can currently run in my system. I know I can probably avoid this by setting smaller domains but checking the pair sizes might be a good idea even before going through the modelling of the pairs just to crash at the last step.

At very least I thought of documenting this. Cheers,

Last lines of log with the error (everything else in the log is fine):

found 10 transformations between 487 and 487
--- Finished building unified representation
--- Running combinatorial assembly algorithm, may take a while
--- Finished combinatorial assembly, writing output models
Traceback (most recent call last):
  File "/g/data1a/u71/xabi/miniconda3/envs/combfold/lib/python3.12/site-packages/Bio/PDB/PDBIO.py", line 379, in save
    s = get_atom_line(
        ^^^^^^^^^^^^^^
  File "/g/data1a/u71/xabi/miniconda3/envs/combfold/lib/python3.12/site-packages/Bio/PDB/PDBIO.py", line 188, in _get_atom_line
    raise ValueError(
ValueError: Atom serial number ('100000') exceeds PDB format limit.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/g/data1a/u71/xabi/CombFold/scripts/run_on_pdbs.py", line 400, in <module>
    run_on_pdbs_folder(os.path.abspath(sys.argv[1]), os.path.abspath(sys.argv[2]), os.path.abspath(sys.argv[3]))
  File "/g/data1a/u71/xabi/CombFold/scripts/run_on_pdbs.py", line 380, in run_on_pdbs_folder
    assembled_files = create_complexes(clusters_path, first_result=0, last_result=max_results_number,
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/g/data/u71/xabi/CombFold/scripts/libs/prepare_complex.py", line 142, in create_complexes
    create_transformation_pdb(assembly_path, transforms_strs[i], output_path=output_path, output_cif=output_cif)
  File "/g/data/u71/xabi/CombFold/scripts/libs/prepare_complex.py", line 108, in create_transformation_pdb
    _merge_models(output_path, tmp_pdb_path, output_path, output_cif=output_cif)
  File "/g/data/u71/xabi/CombFold/scripts/libs/prepare_complex.py", line 51, in _merge_models
    io.save(output_path)
  File "/g/data1a/u71/xabi/miniconda3/envs/combfold/lib/python3.12/site-packages/Bio/PDB/PDBIO.py", line 391, in save
    raise PDBIOException(
Bio.PDB.PDBExceptions.PDBIOException: Error when writing atom ('s_pdb', 0, 'D', (' ', 1552, ' '), ('HE1', ' '))
ben-shor commented 2 months ago

Hey, As the PDB format doesn't support structures with over 99,999 atoms, so CombFold enables you to output CIF files instead of PDB. You can look here for an explanation of how to enable CIF output: https://github.com/dina-lab3D/CombFold/issues/4#issuecomment-1884906966

let me know if that works for you.