PDB-REDO / dssp

Application to assign secondary structure to proteins
BSD 2-Clause "Simplified" License
168 stars 19 forks source link

DSSP fails with generated cif file #68

Closed shuuul closed 1 year ago

shuuul commented 1 year ago

I am using DSSP to some cif file generated by BioStructures.jl, the cif file looks like

data_6GC4_ba1.cif
#
loop_
_atom_site.group_PDB
_atom_site.id
_atom_site.type_symbol
_atom_site.label_atom_id
_atom_site.label_alt_id
_atom_site.label_comp_id
_atom_site.label_asym_id
_atom_site.label_entity_id
_atom_site.label_seq_id
_atom_site.pdbx_PDB_ins_code
_atom_site.Cartn_x
_atom_site.Cartn_y
_atom_site.Cartn_z
_atom_site.occupancy
_atom_site.B_iso_or_equiv
_atom_site.pdbx_formal_charge
_atom_site.auth_seq_id
_atom_site.auth_comp_id
_atom_site.auth_asym_id
_atom_site.auth_atom_id
_atom_site.pdbx_PDB_model_num
ATOM 73085 N N   . ALA ? ? ? ? 185.887 176.406 181.621 1.00 0.01   ? 1   ALA 0 N   1 
ATOM 73086 C CA  . ALA ? ? ? ? 186.542 177.528 182.277 1.00 0.01   ? 1   ALA 0 CA  1 
ATOM 73087 C C   . ALA ? ? ? ? 186.071 178.842 181.677 1.00 0.01   ? 1   ALA 0 C   1 
...

the output looks like

DSSP: ==== Secondary Structure Definition by the program DSSP, NKI version 4.4.0                         ==== DATE=2023-09-19        .
│ REFERENCE W. KABSCH AND C.SANDER, BIOPOLYMERS 22 (1983) 2577-2637                                                              .
│ HEADER    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx            6GC4                                                             .
│ COMPND                                                                                                                         .
│ SOURCE                                                                                                                         .
│ AUTHOR                                                                                                                         .
│     0  1  0  0  0 TOTAL NUMBER OF RESIDUES, NUMBER OF CHAINS, NUMBER OF SS-BRIDGES(TOTAL,INTRACHAIN,INTERCHAIN)                .
│      0.0   ACCESSIBLE SURFACE OF PROTEIN (ANGSTROM**2)                                                                         .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(J)  , SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS IN     PARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS IN ANTIPARALLEL BRIDGES, SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-5), SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-4), SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-3), SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-2), SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I-1), SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+0), SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+1), SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+2), SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+3), SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+4), SAME NUMBER PER 100 RESIDUES                              .
│     0  nan   TOTAL NUMBER OF HYDROGEN BONDS OF TYPE O(I)-->H-N(I+5), SAME NUMBER PER 100 RESIDUES                              .
│   1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30     *** HISTOGRAMS OF ***           .
│   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    RESIDUES PER ALPHA HELIX         .
│   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    PARALLEL BRIDGES PER LADDER      .
│   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    ANTIPARALLEL BRIDGES PER LADDER  .
│   0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0  0    LADDERS PER SHEET                .
└   #  RESIDUE AA STRUCTURE BP1 BP2  ACC     N-H-->O    O-->H-N    N-H-->O    O-->H-N    TCO  KAPPA ALPHA  PHI   PSI    X-CA   Y-CA   Z-CA

May I know what header information the cif file must have? For PDB file I just add one line like HEADER XXXXXXXXXXXXXXXXXXXXXX 25-DEC-23 XXXX and it works.

drlemmus commented 1 year ago

Could you send us the full cif-file so we can check what is missing?

shuuul commented 1 year ago

Sure, here is the link. The file I want to process is 6GC4_ba1_m1.cif. Thanks for your help!

mhekkel commented 1 year ago

This file you try to process is a bit stripped down. There's only the atom_site category. A valid file would have contained much more.

If you look at the contents of the other file you have in your google drive, it contains much more data.

It is possible to reconstruct a somewhat valid cif file for your input file. If you install the cif-tools you can do a cif2pdb and then convert that pdb into cif again using pdb2cif. The pdb file as well as the converted cif file are accepted by mkdssp.

We will eventually change our code to accept these crippled files, but perhaps it is best to complain with the authors of BioStructures.jl for writing out such invalid data.