a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.03k stars 131 forks source link

Problem in add_atomic_edges #74

Closed shrimonmuke0202 closed 2 years ago

shrimonmuke0202 commented 3 years ago
from graphein.protein.edges.atomic import add_atomic_edges
params_to_change = {"granularity": "atom", "edge_construction_functions": [add_atomic_edges]}

config = ProteinGraphConfig(**params_to_change)
config.dict()

from graphein.protein.graphs import construct_graph

g = construct_graph(config=config, pdb_code="3eiy")

When I execute the above code an error occurs. The error is given below. How can I get rid of these type of errors?

1 from graphein.protein.graphs import construct_graph
      2 
----> 3 g = construct_graph(config=config, pdb_code="3eiy")
      4 # To use a local file, you can do:
      5 # g = construct_graph(config=config, pdb_path="../examples/pdbs/3eiy.pdb")

/srv/conda/envs/notebook/lib/python3.8/site-packages/graphein/protein/graphs.py in construct_graph(config, pdb_path, pdb_code, chain_selection, df_processing_funcs, edge_construction_funcs, edge_annotation_funcs, node_annotation_funcs, graph_annotation_funcs)
    611 
    612     # Compute graph edges
--> 613     g = compute_edges(
    614         g,
    615         funcs=config.edge_construction_functions,

/srv/conda/envs/notebook/lib/python3.8/site-packages/graphein/protein/graphs.py in compute_edges(G, funcs, get_contacts_config)
    505 
    506     for func in funcs:
--> 507         func(G)
    508 
    509     return G

/srv/conda/envs/notebook/lib/python3.8/site-packages/graphein/protein/edges/atomic.py in add_atomic_edges(G)
     86     """
     87     TOLERANCE = 0.56  # 0.4 0.45, 0.56 This is the distance tolerance
---> 88     dist_mat = compute_distmat(G.graph["pdb_df"])
     89 
     90     # We assign bond states to the dataframe, and then map these to covalent radii

/srv/conda/envs/notebook/lib/python3.8/site-packages/graphein/protein/edges/distance.py in compute_distmat(pdb_df)
     55     )
     56     eucl_dists = pd.DataFrame(squareform(eucl_dists))
---> 57     eucl_dists.index = pdb_df.index
     58     eucl_dists.columns = pdb_df.index
     59 

/srv/conda/envs/notebook/lib/python3.8/site-packages/pandas/core/generic.py in __setattr__(self, name, value)
   5498         try:
   5499             object.__getattribute__(self, name)
-> 5500             return object.__setattr__(self, name, value)
   5501         except AttributeError:
   5502             pass

/srv/conda/envs/notebook/lib/python3.8/site-packages/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.__set__()

/srv/conda/envs/notebook/lib/python3.8/site-packages/pandas/core/generic.py in _set_axis(self, axis, labels)
    764     def _set_axis(self, axis: int, labels: Index) -> None:
    765         labels = ensure_index(labels)
--> 766         self._mgr.set_axis(axis, labels)
    767         self._clear_item_cache()
    768 

/srv/conda/envs/notebook/lib/python3.8/site-packages/pandas/core/internals/managers.py in set_axis(self, axis, new_labels)
    214     def set_axis(self, axis: int, new_labels: Index) -> None:
    215         # Caller is responsible for ensuring we have an Index object.
--> 216         self._validate_set_axis(axis, new_labels)
    217         self.axes[axis] = new_labels
    218 

/srv/conda/envs/notebook/lib/python3.8/site-packages/pandas/core/internals/base.py in _validate_set_axis(self, axis, new_labels)
     55 
     56         elif new_len != old_len:
---> 57             raise ValueError(
     58                 f"Length mismatch: Expected axis has {old_len} elements, new "
     59                 f"values have {new_len} elements"

ValueError: Length mismatch: Expected axis has 1 elements, new values have 0 elements  
a-r-j commented 3 years ago

I believe this is related to #41. Will have a fix this week.

a-r-j commented 2 years ago

@shrimonmuke0202 Hiya, sorry for the delay. This should be fixed now. Ping me if you run into any problems!

shrimonmuke0202 commented 2 years ago

After reinstalling graphein in the source code of graphein.protein.graphs, the line `if granularity == "centroids": atoms = convert_structure_to_centroids(atoms)

elif granularity == "atom":

#    atoms = atoms`  present
shrimonmuke0202 commented 2 years ago

`ValueError Traceback (most recent call last) /tmp/ipykernel_1933/1307404750.py in 2 3 #g = construct_graph(config=config,pdb_path='pocket1_atm.pdb') ----> 4 g1 = construct_graph(config=config1,pdb_path='/mnt/d/Allostric_site_pred/New_data/pockets/1AO0_out/pockets/pocket1_atm.pdb') 5 #g2 = construct_graph(config=config,pdb_path='/mnt/d/datastructer_python/3eiy.pdb')

~/.local/lib/python3.8/site-packages/graphein/protein/graphs.py in construct_graph(config, pdb_path, pdb_code, chain_selection, df_processing_funcs, edge_construction_funcs, edge_annotation_funcs, node_annotation_funcs, graph_annotation_funcs) 611 612 # Compute graph edges --> 613 g = compute_edges( 614 g, 615 funcs=config.edge_construction_functions,

~/.local/lib/python3.8/site-packages/graphein/protein/graphs.py in compute_edges(G, funcs, get_contacts_config) 505 506 for func in funcs: --> 507 func(G) 508 509 return G

~/.local/lib/python3.8/site-packages/graphein/protein/edges/atomic.py in add_atomic_edges(G) 86 """ 87 TOLERANCE = 0.56 # 0.4 0.45, 0.56 This is the distance tolerance ---> 88 dist_mat = compute_distmat(G.graph["pdb_df"]) 89 90 # We assign bond states to the dataframe, and then map these to covalent radii

~/.local/lib/python3.8/site-packages/graphein/protein/edges/distance.py in compute_distmat(pdb_df) 55 ) 56 eucl_dists = pd.DataFrame(squareform(eucl_dists)) ---> 57 eucl_dists.index = pdb_df.index 58 eucl_dists.columns = pdb_df.index 59

~/.local/lib/python3.8/site-packages/pandas/core/generic.py in setattr(self, name, value) 5498 try: 5499 object.getattribute(self, name) -> 5500 return object.setattr(self, name, value) 5501 except AttributeError: 5502 pass

~/.local/lib/python3.8/site-packages/pandas/_libs/properties.pyx in pandas._libs.properties.AxisProperty.set()

~/.local/lib/python3.8/site-packages/pandas/core/generic.py in _set_axis(self, axis, labels) 764 def _set_axis(self, axis: int, labels: Index) -> None: 765 labels = ensure_index(labels) --> 766 self._mgr.set_axis(axis, labels) 767 self._clear_item_cache() 768

~/.local/lib/python3.8/site-packages/pandas/core/internals/managers.py in set_axis(self, axis, new_labels) 214 def set_axis(self, axis: int, new_labels: Index) -> None: 215 # Caller is responsible for ensuring we have an Index object. --> 216 self._validate_set_axis(axis, new_labels) 217 self.axes[axis] = new_labels 218

~/.local/lib/python3.8/site-packages/pandas/core/internals/base.py in _validate_set_axis(self, axis, new_labels) 55 56 elif new_len != old_len: ---> 57 raise ValueError( 58 f"Length mismatch: Expected axis has {old_len} elements, new " 59 f"values have {new_len} elements"

ValueError: Length mismatch: Expected axis has 1 elements, new values have 0 elements` this error still present after reinstalling graphein

a-r-j commented 2 years ago

Hi @shrimonmuke0202, I've made the changes available in the version on PyPI. Could you try reinstalling via pip and see if that helps?

diamondspark commented 2 years ago

This still throws the same error, especially for Disulphide bonds. Look at Uniprot id Q15843 for instance. Any insights to the reason for this error? @a-r-j

a-r-j commented 2 years ago

Hiya @diamondspark, I'm struggling to reproduce the error (https://gist.github.com/a-r-j/5a5bfae2dd12dced5328fcc8f1caa52e).

Could you share a minimal example & indicate which PDB you're using for Q15843 - I just grabbed one from UniProt. Keen to get this fixed so any help is much appreciated!

Have you checked if you're using Graphein 1.0.9? You may have to use the --no-cache-dir flag eg:

pip install graphein==1.0.9 --no-cache-dir

diamondspark commented 2 years ago

@a-r-j Thank you for the prompt response. I'm using 1.0.0, will try out 1.0.9 Here's a sample code that breaks


protein_path = download_alphafold_structure('Q15843')[0]
print(protein_path)
g = construct_graph(pdb_path=protein_path)
new_edge_funcs = {"edge_construction_functions": [add_peptide_bonds,
                                                  add_hydrogen_bond_interactions,
                                                  add_disulfide_interactions,
                                                  add_ionic_interactions,
                                                  add_aromatic_sulphur_interactions,
                                                  add_hydrophobic_interactions,
                                                  add_cation_pi_interactions]
                 }
config = ProteinGraphConfig(**new_edge_funcs)
# g = construct_graph(config=config, pdb_code="3eiy")
g = construct_graph(pdb_path=protein_path, config=config)```
a-r-j commented 2 years ago

Thanks @diamondspark I can reproduce the problem. Will check it out & hopefully fix it either this weekend or next week.

a-r-j commented 2 years ago

@diamondspark Thanks again for pointing out the problem. I've fixed this in v1.0.10 (pip install graphein==1.0.10)

It turns out the problem was caused by the protein in your example not containing any cysteine residues (and therefore we cannot create a distance matrix of sulfurs used to compute disulfide bonds). I've added a check for a minimum of two CYS residues and it seems to work in my testing. Please do reopen this issue if it's not fixed for you and do let me know if yu have any other troubles :)

diamondspark commented 2 years ago

@a-r-j Is v1.0.10 released? I do pip install graphein==1.0.10, yet graphein.version returns 1.0.0

a-r-j commented 2 years ago

Hi @diamondspark Yep, you can see it here: https://pypi.org/project/graphein/

You may want to try uninstalling & reinstalling or using ‘—upgrade’

If you check the version reported by ‘pip list’ or similar, is it still 1.0.0?

Alternatively, which python version are you using? Earlier versions supported <3.8 but this is no longer the case and graphein requires 3.8+

diamondspark commented 2 years ago

Perhaps it's installed. pip list shows 1.0.10 however graphein.__version__ still shows 1.0.0. Maybe you haven't updated the code to reflect the version?

a-r-j commented 2 years ago

Yep, good spot. Thanks for pointing that out - will take care of this. If the output of pip list shows 1.0.10 you should be good to go! Do let me know if the bug is otherwise fixed.

diamondspark commented 2 years ago

There still seems to be some issue for some proteins in the calculation of distmat. I think some check for Hydrogen Bonds is needed as well. Consider these pdb ids 1qaw 1swr 3ivc 2qbw 3e85 3wzn 2r9x Example g = construct_graph(config=config, pdb_code='3wzn')