a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
994 stars 125 forks source link

dssp version argument #353

Closed biochunan closed 8 months ago

biochunan commented 8 months ago

Hello!

https://github.com/a-r-j/graphein/blob/3d7af1fd07dc16707068b2d63a2f3668fc02c632/graphein/protein/features/nodes/dssp.py#L110C71-L110C71 Here, the dssp version is not specified, which leads to Bio.PDB.DSSP uses the default version 3.9.9 for dssp. For people using dssp version >=4.0.0 this leads to the output format by default as mmcif.

I included Bio.PDB.DSSP line 135 for reference

def dssp_dict_from_pdb_file(in_file, DSSP="dssp", dssp_version="3.9.9"):
... ...
    if version(dssp_version) < version("4.0.0"):
                DSSP_cmd = [DSSP, in_file]
            else:
                DSSP_cmd = [DSSP, "--output-format=dssp", in_file]

Because I'm using dssp version 4.0.4 and without specifying the version number, it leads to an empty dssp_df, and adding an arg of dssp_version works.

dssp_dict = dssp_dict_from_pdb_file(pdb_file, DSSP=executable, dssp_version="4.0.4")

Maybe add a version attribute in DSSPConfig? or automatically detect it from running subprocess dssp --version which returns mkdssp version 4.0.4 for me.

a-r-j commented 8 months ago

Hi @biochunan, appreciate you raising this issue. I'm currently loathe to support DSSP 4 as it requires CRYST1 records for PDB files, which while sensible, will break compatibility for a lot of people who use Graphein for predicted structures. As a result, I'd suggest the conda distribution of DSSP provided by the Sali lab. I realise this is not an ideal strategy in the longterm but am not able to prioritize this at the moment, if you'd like to make a PR, I'm happy to assist - as you might be able to tell from this PR I've already tried to setup a more universal solution. Automatically detecting the version seems like the best approach.

biochunan commented 8 months ago

Thanks for getting back to me! I've opened a PR #355 .