PDB-REDO / dssp

Application to assign secondary structure to proteins
BSD 2-Clause "Simplified" License
162 stars 18 forks source link

DSSP fails without the normal header info #10

Closed n-frazee closed 3 years ago

n-frazee commented 3 years ago

Hey DSSPers,

I've been trying to set up a workflow involving dssp so that I can apply it to many frames of a trajectory. Part of that is writing out my trajectory into several pdb files for dssp to use. The issue is that my pdbs do not have the normal remarks including journal information and the sort after going through this process.

Start of one of the pdbs:

CRYST1 0.000 0.000 0.000 90.00 90.00 90.00 P1 1
ATOM 1 N ALA X 1 -12.935 14.720 12.724 1.00 0.00 PROT
ATOM 2 HT1 ALA X 1 -12.744 14.789 13.744 1.00 0.00 PROT

I saw in issue #1 that the cryst1 line is required and I have it but I still get this error from dssp when trying to run one of these pdb files:

Expected record HEADER but found CRYST1 Expected record TITLE but found CRYST1 Expected record COMPND but found CRYST1 Expected record SOURCE but found CRYST1 Expected record KEYWDS but found CRYST1 Expected record EXPDTA but found CRYST1 Expected record AUTHOR but found CRYST1 Could not load the mon_lib_list.cif file from CCP4, please make sure you have installed CCP4 and sourced the environment. missing mandatory field entity_id for Category struct_asym missing mandatory field type_symbol for Category atom_site Resulting mmCIF file is not valid! missing mandatory field entity_id for Category struct_asym missing mandatory field type_symbol for Category atom_site Invalid mmCIF file. Not a known element:

I have successfully used dssp on files containing these remarks (right off of the PDB) but get this error otherwise.

Any help would be appreciated!

Nick

drlemmus commented 3 years ago

Hi Nick,

Most problems will disappear if you start your PDB file with a HEADER line. TITLE etc are not strictly needed. Once that is fixed we can look closer to iron out other potential problems.

drlemmus commented 3 years ago

Another problem you might run into is that you have an atom called HT1 in alanine. Not sure how you got that one, but that is not part of a normal alanine (see http://ligand-expo.rcsb.org/pyapps/ldHandler.py?formid=cc-index-search&target=ALA&operation=ccid). There are quite a few programs that use non-standard names for hydrogens. The good news is that DSSP does not need hydrogens so if you run into issues with those, you can just leave them out.

mhekkel commented 3 years ago

When writing computer software that takes input, one has to make a choice: either you accept garbage as input and spit out garbage at the end, or you enforce the entry of valid data so you can at least be confident that you've done everything to ensure the output is meaningful. Most software in the old days belonged to the first category. That was perhaps a sensible decision at the time, computing power was limited. Nowadays, the world is gearing towards stricter validation of input (JSON, XML, mmCIF). The original PDB specification already stated that certain fields are mandatory but many tools simply ploughed on even if they were absent. DSSP is now based on libcifpp. It starts by converting the input file into mmCIF format and yes, that means certain information is required. I still had to accept that PDB files might come without a full header, the software will complain but will try to continue. But if you then try to feed it an unknown element, it simply refuses to continue. That element simply won't fit. For DSSP this might be a non-issue in your eyes, I consider it to be serious flaw in the input and think it is OK to simply refuse to continue working with this file. Sorry.

n-frazee commented 3 years ago

Thanks @drlemmus for the helpful comments!

For some reason, I fully forgot to mention the trajectory is from a run of constant pH simulations so I definitely have some strange atom names.

But I suppose I'll look elsewhere and maybe come back if I ever have some non-garbage files ;-)