a-r-j / graphein

Protein Graph Library
https://graphein.ai/
MIT License
1.03k stars 131 forks source link

PowerIterationFailedConvergence #156

Closed avivko closed 2 years ago

avivko commented 2 years ago

Describe the bug graph summary: nx.eigenvector_centrality seems to throw an error.

To Reproduce

from graphein.protein.subgraphs import extract_subgraph_from_chains
from graphein.protein.config import ProteinGraphConfig, DSSPConfig
from graphein.protein.features.nodes import asa, rsa
from graphein.protein.analysis import graph_summary

analytics_edge_functions = {"edge_construction_functions": [add_peptide_bonds,
                                                  add_aromatic_interactions,
                                                  add_hydrogen_bond_interactions,
                                                  add_disulfide_interactions,
                                                  add_ionic_interactions,
                                                  add_aromatic_sulphur_interactions,
                                                  add_cation_pi_interactions],
                  "graph_metadata_functions": [asa, rsa],  # Add ASA and RSA features.
                  "dssp_config":DSSPConfig()                # Add DSSP config in order to compute ASA and RSA.
                           }
config = ProteinGraphConfig(**analytics_edge_functions)
g = construct_graph(config=config, pdb_code="3eiy")
graph_summary(g)

Results in:

---------------------------------------------------------------------------
PowerIterationFailedConvergence           Traceback (most recent call last)
Input In [28], in <cell line: 3>()
      1 from graphein.protein.analysis import graph_summary
----> 3 graph_summary(g)

File /glusterfs/dfs-gfs-dist/kormanav/miniconda3/envs/graphein-gpu/lib/python3.8/site-packages/graphein/protein/analysis.py:177, in graph_summary(G, summary_statistics, custom_data, plot)
    175     col_names.append("closeness_centrality")
    176 if "eigenvector_centrality" in summary_statistics:
--> 177     eigenvector = pd.Series(nx.eigenvector_centrality(G))
    178     col_list.append(eigenvector)
    179     col_names.append("eigenvector_centrality")

File <class 'networkx.utils.decorators.argmap'> compilation 8:4, in argmap_eigenvector_centrality_5(G, max_iter, tol, nstart, weight)
      2 from os.path import splitext
      3 from contextlib import contextmanager
----> 4 from pathlib import Path
      6 import networkx as nx
      7 from networkx.utils import create_random_state, create_py_random_state

File /glusterfs/dfs-gfs-dist/kormanav/miniconda3/envs/graphein-gpu/lib/python3.8/site-packages/networkx/algorithms/centrality/eigenvector.py:137, in eigenvector_centrality(G, max_iter, tol, nstart, weight)
    135     if sum(abs(x[n] - xlast[n]) for n in x) < nnodes * tol:
    136         return x
--> 137 raise nx.PowerIterationFailedConvergence(max_iter)

PowerIterationFailedConvergence: (PowerIterationFailedConvergence(...), 'power iteration failed to converge within 100 iterations')

Desktop (please complete the following information):

a-r-j commented 2 years ago

Hi @avivko this is not a problem with Graphein (or nx, really) but rather with the graph. This particular method fails when there are multiple largest eigenvalues. You can try with: nx.eigenvector_centrality_numpy, I believe. If that works out, would you be able to let me know & I can update Graphein to use this method for the centrality measure?

avivko commented 2 years ago

@a-r-j Thanks for responding. This seems to happen with all the PDB-based graphs I was trying to work with. Using the Numpy method that you suggested does solve this:

import importlib
import sys

def modify_and_import(module_name, package, modification_func):
    spec = importlib.util.find_spec(module_name, package)
    source = spec.loader.get_source(module_name)
    new_source = modification_func(source)
    module = importlib.util.module_from_spec(spec)
    codeobj = compile(new_source, module.__spec__.origin, 'exec')
    exec(codeobj, module.__dict__)
    sys.modules[module_name] = module
    return module

modified_analysis = modify_and_import("graphein.protein.analysis", "graphein", lambda src: src.replace("nx.eigenvector_centrality(G)", "nx.eigenvector_centrality_numpy(G)"))

modified_analysis.graph_summary(g)

Results in the expected Pandas DataFrame being returned (as expected)

So I'd recommend replacing the method in Graphein's source code :)

a-r-j commented 2 years ago

This has now been changed in 1.4.0 :)