PDB-REDO / dssp

Application to assign secondary structure to proteins
BSD 2-Clause "Simplified" License
166 stars 19 forks source link

large memory usage in pdb file from ESMFold #66

Closed jasondbiggs closed 7 months ago

jasondbiggs commented 1 year ago

When I try to run it on a file downloaded from ESMFold's API the memory used by mkdssp seems to increase out of control.

11ak2en.txt

Rename the attached file to 11ak2en.pdb and call

./mkdssp 11ak2en.pdb 11ak2en_out.pdb

and I watched the memory climb to 14GB before killing the process. I see this on my intel macbook with a locally compiled version of mkdssp, as well as with the linux binary found on the releases page.

This PDB file is missing the CRYSTAL line, the unit cell lines, the MODEL/ENDMDL lines, and the END line. Copy/pasting these sections from another file into this file fixes the issue.

Requiring properly formatted PDB files is fine, but a better failure method is probably called for.

mhekkel commented 1 year ago

The last line in the PDB file does not end with a newline character. Apparently my PDB parsing code does not like that. I'll have to fix this.

Thanks for reporting.