a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.01k stars 126 forks source link

Structure with 1 (or odd number) of Cysteine breaks add_salt_bridges #241

Closed universvm closed 1 year ago

universvm commented 1 year ago

Describe the bug Trying to convert a structure with 1 or odd Cysteine numbers causes graphein to return ValueErrors. I

To Reproduce Insert a pdb with 1 cysteine through construct_graphs with add_salt_bridges.

  File "{REDACTED}.conda/envs/3dtcr/lib/python3.8/site-packages/graphein/protein/graphs.py", line 587, in compute_edges
    func(G)
  File "{REDACTED}.conda/envs/3dtcr/lib/python3.8/site-packages/graphein/protein/edges/distance.py", line 753, in add_salt_bridges
    distmat = compute_distmat(salt_bridge_df)
  File "{REDACTED}.conda/envs/3dtcr/lib/python3.8/site-packages/graphein/protein/edges/distance.py", line 74, in compute_distmat
    eucl_dists.index = pdb_df.index
  File "{REDACTED}.conda/envs/3dtcr/lib/python3.8/site-packages/pandas/core/generic.py", line 5915, in __setattr__
    return object.__setattr__(self, name, value)
  File "pandas/_libs/properties.pyx", line 69, in pandas._libs.properties.AxisProperty.__set__
  File "{REDACTED}.conda/envs/3dtcr/lib/python3.8/site-packages/pandas/core/generic.py", line 823, in _set_axis
    self._mgr.set_axis(axis, labels)
  File "{REDACTED}.conda/envs/3dtcr/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 230, in set_axis
    self._validate_set_axis(axis, new_labels)
  File "{REDACTED}.conda/envs/3dtcr/lib/python3.8/site-packages/pandas/core/internals/base.py", line 70, in _validate_set_axis
    raise ValueError(
ValueError: Length mismatch: Expected axis has 1 elements, new values have 0 elements

Expected behavior It would be clearer if there was a check on the number of cysteine available rather than a generic valueerror

Desktop (please complete the following information):

universvm commented 1 year ago

Same thing happens withadd_aromatic_interactions

a-r-j commented 1 year ago

Hi @universvm do you have an example PDB code / code to reproduce? I'm not fully convinced it's due to the Cysteines.

universvm commented 1 year ago

Hi @a-r-j

I think this happens because there is no distance between cysteines if there is only one cysteine, hence the "new values have 0 elements"

Have a look at these two files generated with ESMFold

Archive.zip

a-r-j commented 1 year ago

Hmm, I'm struggling to reproduce this error on graphein 1.5.2

import graphein
import graphein.protein as gp

print(graphein.__version__)

config = gp.ProteinGraphConfig(
    edge_construction_functions=[gp.add_salt_bridges, gp.add_aromatic_interactions]
)

g = gp.construct_graph(pdb_path="1CYS.pdb", config=config)
print(g.nodes)

for u,v,d in g.edges(data=True):
    print(u,v,d)

gp.add_salt_bridges(g)

for u, v, d in g.edges(data=True):
    print(u, v, d)

gp.add_aromatic_interactions(g)

for u, v, d in g.edges(data=True):
    print(u, v, d)
universvm commented 1 year ago

Can you try with this sequence instead? CIVRAPGRADMRF.pdb.zip

a-r-j commented 1 year ago

Runs fine for me :confused:

universvm commented 1 year ago

This is the portion of my code I'm allowed to share:

graphein_params = {
    "edge_construction_functions": [
        add_peptide_bonds,
        add_hydrogen_bond_interactions,
        add_disulfide_interactions,
        add_ionic_interactions,
        add_vdw_interactions,
        add_salt_bridges,
    ],
    "edge_labels": edge_labels,
    "node_metadata_functions": [
        meiler_embedding,
        amino_acid_one_hot,
        hydrogen_bond_donor,
        hydrogen_bond_acceptor,
    ],
    "dssp_config": gp.DSSPConfig(),
}

edge_labels = [{"peptide_bonds","hydrogen_bond","disulfide","ionic","vdw","salt_bridges"}]
g = construct_graph(config=config, pdb_path=str(path_to_pdb))
g = g.to_undirected()
# Add DSSP features - in the future this will be done in configs https://github.com/a-r-j/graphein/issues/239
g = add_dssp_feature(g, feature="phi")

The code breaks before I get to the g = g.to_undirected()

a-r-j commented 1 year ago

I still can't reproduce this in a clean environment with any of the provided files:

# !pip install graphein
# !conda install -c salilab dssp
import graphein.protein as gp

#path_to_pdb = "CIVRAPGRADMRF.pdb"
#path_to_pdb = "2CYS.pdb"
path_to_pdb = "1CYS.pdb"

edge_labels = [{"peptide_bonds","hydrogen_bond","disulfide","ionic","vdw","salt_bridges"}]
graphein_params = {
    "edge_construction_functions": [
        gp.add_peptide_bonds,
        gp.add_hydrogen_bond_interactions,
        gp.add_disulfide_interactions,
        gp.add_ionic_interactions,
        gp.add_vdw_interactions,
        gp.add_salt_bridges,
    ],
    "edge_labels": edge_labels,
    "node_metadata_functions": [
        gp.meiler_embedding,
        gp.amino_acid_one_hot,
        gp.hydrogen_bond_donor,
        gp.hydrogen_bond_acceptor,
    ],
    "dssp_config": gp.DSSPConfig(),
}

config = gp.ProteinGraphConfig(**graphein_params)

g = gp.construct_graph(config=config, pdb_path=str(path_to_pdb))
g = g.to_undirected()
# Add DSSP features - in the future this will be done in configs https://github.com/a-r-j/graphein/issues/239
g = gp.add_dssp_feature(g, feature="phi")

@universvm Could you confirm your python & graphein versions? I tested the above code with graphein 1.5.2 (and, from main) on Python 3.9.

universvm commented 1 year ago

I'm using: graphein 1.5.2 and Python 3.8.13

Apologies I'm currently away so don't have access to a good internet connection. I'll try it out again when I'm back. The issue seemed to happen uniquely if I used add_salt_bridges and add_aromatic_interactions in sequences that had an odd number of either of the residues.

Will try again in a week or so!

universvm commented 1 year ago

Hey @a-r-j , I am able to reproduce the bug in the original post with this structure and the code you wrote: CAVSGSGQFYF.pdb.zip

a-r-j commented 1 year ago

I've managed to reproduce it too! The problem was this has already been fixed in #220 but not pushed to PyPi yet.

I'll make a release this week with some new updates. In the meantime, installing from master should do the trick :)

a-r-j commented 1 year ago

Now resolved in 1.6.0

pip install graphein=1.6.0