I am having trouble running DockQ with moderately large homo-dimers. Is this a known issue for the tools here to fail when there are many residues in a chain?
I ran DockQ successfully for most of the models and references in a given benchmark set but the largest files failed.
The smallest file where I could observe a failure was when comparing the attached (4u59_2_files.zip) 4u59_2_model.pdb with 4u59_2.pdb (i.e. simple call ./DockQ.py 4u59_2_model.pdb 4u59_2.pdb).
Here the model covers more than the reference and so ./DockQ.py 4u59_2.pdb 4u59_2.pdb works (3076 residues in 4u59_2) while ./DockQ.py 4u59_2_model.pdb 4u59_2_model.pdb fails (3294 residues in 4u59_2_model).
The traceback of the error looks as follows when run with Python 3:
Traceback (most recent call last):
File ".../DockQ.py", line 730, in <module>
main()
File ".../DockQ.py", line 658, in main
info=calc_DockQ(model,native,use_CA_only=use_CA_only,capri_peptide=capri_peptide) #False):
File ".../DockQ.py", line 112, in calc_DockQ
fnat_out = os.popen(cmd_fnat).read()
File ".../python3.9/codecs.py", line 322, in decode
(result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xef in position 852: invalid continuation byte
and as follows with Python 2:
Traceback (most recent call last):
File "../DockQ.py", line 730, in <module>
main()
File "../DockQ.py", line 658, in main
info=calc_DockQ(model,native,use_CA_only=use_CA_only,capri_peptide=capri_peptide) #False):
File "../DockQ.py", line 118, in calc_DockQ
assert fnat!=-1, "Error running cmd: %s\n" % (cmd_fnat)
AssertionError: Error running cmd: .../fnat 4u59_2_model.pdb 4u59_2_model.pdb 5 -all
The latter error indicates an issue in the fnat binary which indeed produces wrong looking characters before segfaulting. Here the last few lines of the output of fnat 4u59_2_model.pdb 4u59_2_model.pdb 5:
As an additional note I observed plenty of compile-time warnings when compiling using GCC 10.3.0 and it may be worth checking them as they could be indicative of some overflows or so...
The specific files do not matter and I could reproduce the same failures when downloading moderately large homo-dimers from the PDB (e.g. https://files.rcsb.org/download/6EQO.pdb).
Given that large complexed and multi-domain proteins are interesting and challenging prediction problems it would be good to fix the issue described here to be able to apply DockQ on benchmarks for such problems.
I am having trouble running DockQ with moderately large homo-dimers. Is this a known issue for the tools here to fail when there are many residues in a chain?
I ran DockQ successfully for most of the models and references in a given benchmark set but the largest files failed. The smallest file where I could observe a failure was when comparing the attached (4u59_2_files.zip) 4u59_2_model.pdb with 4u59_2.pdb (i.e. simple call
./DockQ.py 4u59_2_model.pdb 4u59_2.pdb
).Here the model covers more than the reference and so
./DockQ.py 4u59_2.pdb 4u59_2.pdb
works (3076 residues in 4u59_2) while./DockQ.py 4u59_2_model.pdb 4u59_2_model.pdb
fails (3294 residues in 4u59_2_model).The traceback of the error looks as follows when run with Python 3:
and as follows with Python 2:
The latter error indicates an issue in the
fnat
binary which indeed produces wrong looking characters before segfaulting. Here the last few lines of the output offnat 4u59_2_model.pdb 4u59_2_model.pdb 5
:As an additional note I observed plenty of compile-time warnings when compiling using GCC 10.3.0 and it may be worth checking them as they could be indicative of some overflows or so...
The specific files do not matter and I could reproduce the same failures when downloading moderately large homo-dimers from the PDB (e.g. https://files.rcsb.org/download/6EQO.pdb).
Given that large complexed and multi-domain proteins are interesting and challenging prediction problems it would be good to fix the issue described here to be able to apply DockQ on benchmarks for such problems.