bjornwallner / DockQ

DockQ is a single continuous quality measure for protein docked models based on the CAPRI evaluation protocol
MIT License
186 stars 46 forks source link

IndexError: list index out of range #15

Closed kaymccoy closed 2 months ago

kaymccoy commented 1 year ago

I’m attempting to get the DOCKQ score of a model of CAPRI target #50, from the score_set dataset. The model is named Target50_0000.pdb and the correct crystal structure is named Target50_3r2x.pdb. Both are attached (but with the extention txt added, as pdb files aren't allowed to be uploaded by github) here:

Target50_0000.pdb.txt Target50_3r2x.pdb.txt

Running

scripts/fix_numbering.pl /path/to/Target50_0000.pdb /path/to/Target50_3r2x.pdb

works fine, but running

python3 DockQ.py /path/to/Target50_0000.pdb.fixed /path/to/Target50_3r2x.pdb -native_chain1 A B -native_chain2 C -model_chain1 A B -model_chain2 C

results in the following error:

Traceback (most recent call last):
  File "/dartfs/rc/lab/G/Grigoryanlab/home/coy/DockQ/DockQ.py", line 732, in <module>
    main()    
  File "/dartfs/rc/lab/G/Grigoryanlab/home/coy/DockQ/DockQ.py", line 510, in main
    model_chains=get_pdb_chains(model)
  File "/dartfs/rc/lab/G/Grigoryanlab/home/coy/DockQ/DockQ.py", line 387, in get_pdb_chains
    pdb_struct = pdb_parser.get_structure("reference", pdb)[0]
  File "/dartfs-hpc/rc/home/4/f002v94/.conda/envs/myenv/lib/python3.9/site-packages/Bio/PDB/PDBParser.py", line 100, in get_structure
    self._parse(lines)
  File "/dartfs-hpc/rc/home/4/f002v94/.conda/envs/myenv/lib/python3.9/site-packages/Bio/PDB/PDBParser.py", line 123, in _parse
    self.trailer = self._parse_coordinates(coords_trailer)
  File "/dartfs-hpc/rc/home/4/f002v94/.conda/envs/myenv/lib/python3.9/site-packages/Bio/PDB/PDBParser.py", line 198, in _parse_coordinates
    resseq = int(line[22:26].split()[0])  # sequence identifier
IndexError: list index out of range
ELMIAR-0642 commented 8 months ago

I got the same error too, while comparing the native and predicted

python3 DockQ.py ./complex.1.pdb ./5j13_native.pdb -native_chain1 A B -model_chain1 A B -native_chain2 C -model_chain2 C > decoy1.log

Traceback (most recent call last):
  File "/home/randd/Desktop/Desktop_Office/October2023/ThirdWeek/dockq/DockQ/DockQ.py", line 732, in <module>
    main()    
  File "/home/path/to/dockq/DockQ/DockQ.py", line 648, in main
    info=calc_DockQ(model_fixed,native,use_CA_only)
  File "/home/path/to/dockq/DockQ/DockQ.py", line 137, in calc_DockQ
    sample_structure = pdb_parser.get_structure("model", model)
  File "/home/#####/.local/lib/python3.8/site-packages/Bio/PDB/PDBParser.py", line 100, in get_structure
    self._parse(lines)
  File "/home/#####/.local/lib/python3.8/site-packages/Bio/PDB/PDBParser.py", line 123, in _parse
    self.trailer = self._parse_coordinates(coords_trailer)
  File "/home/#####/.local/lib/python3.8/site-packages/Bio/PDB/PDBParser.py", line 198, in _parse_coordinates
    resseq = int(line[22:26].split()[0])  # sequence identifier
IndexError: list index out of range
zmcdargh commented 3 months ago

I'm also experiencing this issue--it seems the issue is with the file written by the renumbering step. Biopython is unable to parse this pdb file; for me the file looks like this:

ATOM   8808  HZ  PHE B9017X     34.056  40.265  41.472  1.00  0.00           H  
ATOM   8809  N   THR B          37.287  41.477  35.884  1.00  0.00           N  

The second line here, with no resseq entry, causes the problem for Biopython.

clami66 commented 2 months ago

Hi, check the new released version of DockQ (v2.0). This works for me now given your attached files. Notice that I had to use the new --allowed_mismatches flag since the two structures don't have identical sequences:

DockQ ~/Downloads/Target50_0000.pdb.txt ~/Downloads/Target50_3r2x.pdb.txt --allowed_mismatches 4
****************************************************************
*                       DockQ                                  *
*   Scoring function for protein-protein docking models        *
*   Statistics on CAPRI data:                                  *
*    0.00 <= DockQ <  0.23 - Incorrect                         *
*    0.23 <= DockQ <  0.49 - Acceptable quality                *
*    0.49 <= DockQ <  0.80 - Medium quality                    *
*            DockQ >= 0.80 - High quality                      *
*   Ref: S. Basu and B. Wallner, DockQ: A quality measure for  *
*   protein-protein docking models                             *
*                            doi:10.1371/journal.pone.0161879  *
*   For comments, please email: bjorn.wallner@.liu.se          *
****************************************************************
Model  : /home/claudio/Downloads/Target50_0000.pdb.txt
Native : /home/claudio/Downloads/Target50_3r2x.pdb.txt
Total DockQ over 3 native interfaces: 0.977
Native chains: A, B
    Model chains: A, B
    DockQ_F1: 0.937
    DockQ: 0.950
    irms: 0.520
    Lrms: 0.883
    fnat: 0.969
Native chains: A, C
    Model chains: A, C
    DockQ_F1: 0.014
    DockQ: 0.014
    irms: 14.961
    Lrms: 47.828
    fnat: 0.000
Native chains: B, C
    Model chains: B, C
    DockQ_F1: 0.013
    DockQ: 0.013
    irms: 16.792
    Lrms: 47.373
    fnat: 0.000
kaymccoy commented 2 months ago

Thanks so much! This update looks great; I appreciate the ability to match the sequence differences now!