CRYST1 record as strict requirement in pdb input

schdaude commented 3 years ago

Hi

I tried to run dssp 4.0 on a PDB file without any remarks, i.e. only lines starting with ATOM. I get the following error:

Error trying to load file Expected record CRYST1 but found ATOM

My guess is that this comes from https://github.com/PDB-REDO/libcifpp/blob/34bbf067939be5706a7fda3e0f3b242f83ab98ec/src/PDB2Cif.cpp#L5275

Is a CRYST1 record a strict requirement for pdb input?

kind regards

Gabriel

drlemmus commented 3 years ago

Dear Gabriel,

Yesdm a CRYST1 record really is a very minimal requirement. A fully valid PDB file actually has more required record so the requirement is actually quite forgiving. You can use a default CRYST1 record as is common for NMR and theoretical models: CRYST1 1.000 1.000 1.000 90.00 90.00 90.00 P 1 1

drlemmus commented 3 years ago

Mind you that the spacing above is not correct. You can use the CRYST1 record from PDB entry 1d3z.

schdaude commented 3 years ago

Thanks a lot! Injecting the CRYST1 line in front does the trick and I use it as a workaround to enable dssp 4.0 in our toolchain.

I setup dssp 4.0 in a vanilla Ubuntu 20.10 VM and needed two workarounds to get there. Even though they're related to libcifpp, I dump them here because of the context:

Change of http to https: https://github.com/PDB-REDO/libcifpp/blob/34bbf067939be5706a7fda3e0f3b242f83ab98ec/tools/update-dictionary-script#L18
point CLIBD_MON env var to a checkout of https://github.com/rlabduke/mon_lib (does this come with a CCP4 installation? The git repo was just a random guess as it contained the required mon_lib_list.cif file. Maybe needs some documentation)

Out of curiosity I built a random model with MODELLER and it doesn't come with a CRYST1 record. Given that such models are common and previous dssp versions processed them without issues, a workaround or at least some documentation would be appreciated by some people ;)

kind regards

Gabriel

drlemmus commented 3 years ago

Thanks for the feedback. We changed the URL to https. The CLIBD_MON variable is indeed set by CCP4 we recommend using the CCP4 monomer library rather than the Phenix clone because we can make sure bugs are fixed in that one directly. We'll have a look at the documentation.

With respect to MODELLER, I'd rather have the problem solved at the source. It should really try to write a valid PDB formatted file (or even better, mmCIF formatted). Would you mind filing a bug report with them. I'll have a look at our documentation.

drlemmus commented 3 years ago

Updated the DSSP man page to explicitly mention the CRYST1 thing.

schdaude commented 3 years ago

Hey, your swift action is appreciated! Regarding bug report to MODELLER: MODELLER was just an example and fixing it there would be a drop in the ocean.

Even though CRYST1 is a mandatory field in the PDB format, there simply are lots of files floating around without it. So in the end its up to the method developers how permissive they want to be with the input.

As the current behavior breaks with older dssp versions, I'd appreciate documentation (thanks for your previous commit!) and/or a workaround that just applies P1 symmetry or something.

kind regards

Gabriel

PS: feel free to close this issue if you think its sufficiently addressed by the change in documentation.

a-r-j commented 11 months ago

Hi @drlemmus is there any scope to revisit this strict requirement? There are a lot more PDB files floating around now without this record (i.e. from structure prediction models). It'd be nice to have a modern version of DSSP that can run on these files out of the box. Something as simple as DSSP injecting a default CRYST1 record in the PDB parser if it can't find one (with an appropriate log message) seems like a sensible thing to do unless I'm missing something.

drlemmus commented 11 months ago

Not really. In modern structural biology/bioinformatics FAIR-nes is a big issue and sticking to file standards (Interoperability and Reusability) is key. In the field we have a legacy format, PDB, that for good reasons (size limitation, compound name space running out, no good place for new meta-data) is being phased out and another format, CIF, that has none of these issues. So the focus is on CIF support and most modern software already supports mmCIF or modelCIF (see the mmCIF and modelCIF paper). As does DSSP. In terms of PDB support we are now quite strict to work towards the adoption of FAIR practices. This is likely to (very) gradually become more strict and ultimately PDB support will be dropped. Not anytime soon, but it will happen. Please be vocal to developers of software that writes non-standard PDB files: please stick to a standard, preferably a future-proof one.

PDB-REDO / dssp

CRYST1 record as strict requirement in pdb input #1