A simplified implementation of DSSP algorithm for PyTorch and NumPy
DSSP (Dictionary of Secondary Structure of Protein) is a popular algorithm for assigning secondary structure of protein backbone structure. [ Wolfgang Kabsch, and Christian Sander (1983)] This repository is a python implementation of DSSP algorithm that simplifies some parts of the algorithm.
pip install pydssp
to install the latest version
pip install git+https://github.com/ShintaroMinami/PyDSSP.git
git clone https://github.com/ShintaroMinami/PyDSSP.git
cd PyDSSSP
python setup.py install
If you have already installed pydssp, you should be able to use pydssp command.
pydssp input_01.pdb input_02.pdb ... input_N.pdb -o output.result
The output.result will be a text format, looking like follows,
-EEEEE-E--EEEEEE---EEEE-HHHH--EEEE--------- input_01.pdb
-HHHHHHHHHHHHHH----HHHHHHHHHHHHHHHHHHH--- input_02.pdb
-EEEE-----EEEE----EEEE--E---EEE-----EEE-EEE-- input_03.pdb
...
# Import
import torch
import pydssp
# Sample coordinates
batch, length, atoms, xyz = 10, 100, 4, 3
## atoms should be 4 (N, CA, C, O) or 5 (N, CA, C, O, H)
coord = torch.randn([batch, length, atom, xyz]) # batch-dim is optional
pydssp.get_hbond_map()
hbond_matrix = pydssp.get_hbond_map(coord)
print(hbond_matrix.shape) # should be (batch, length, length)
$HbondMat(i,j) = (1+\sin((-0.5-E(i,j)-margin)/margin*\pi/2))/2$
Here $E$ is the electrostatic energy defined by (Kabsch and Sander 1983) and $margin(=1.0)$ is introduced to control smoothness.
If you'd like to get the same hbond assignment as DSSP, you can get it by setting the threshold as 0.5.
dssp_hbond_matrix = pydssp.get_hbond_map(coord) > 0.5
pydssp.assign()
dssp = pydssp.assign(coord, out_type='c3')
## output is batched np.ndarray of C3 annotation, like ['-', 'H', 'H', ..., 'E', '-']
# To get secondary str. as index
dssp = pydssp.assign(coord, out_type='index')
## 0: loop, 1: alpha-helix, 2: beta-strand
# To get secondary str. as onehot representation
dssp = pydssp.assign(coord, out_type='onehot')
## dim-0: loop, dim-1: alpha-helix, dim-2: beta-strand
This implementation was simplified from the original DSSP algorithm. The differences from the original DSSP are as follows
Although the above simplifications, the C3 type annotation still matches with the original DSSP for more than 97% of residues on average.
@article{kabsch1983dictionary,
title={Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features},
author={Kabsch, Wolfgang and Sander, Christian},
journal={Biopolymers: Original Research on Biomolecules},
volume={22},
number={12},
pages={2577--2637},
year={1983},
publisher={Wiley Online Library}
}