Open shashank025 opened 6 months ago
The root cause is that the logic inside the bipartite case branch was treating nodes that are just hashable (but neither strings nor ints) as strings. The fix is to properly handle such nodes by maintaining a mapping from strings to corresponding nodes:
diff --git a/cdlib/utils.py b/cdlib/utils.py
index 17a6e03..9ec917b 100644
--- a/cdlib/utils.py
+++ b/cdlib/utils.py
@@ -132,8 +132,9 @@ def __from_nx_to_igraph(g: object, directed: bool = None) -> object:
gi.add_edges([("\\" + str(u), "\\" + str(v)) for (u, v) in g.edges()])
if bipartite.is_bipartite(g) and not skip_bipartite:
+ convert = {str(x):x for x in g.nodes()}
gi.vs["type"] = [
- a_r[name] if type(name) == int else a_r[int(name.replace("\\", ""))]
+ a_r[name] if type(name) == int else a_r[convert[name.replace("\\", "")]]
for name in gi.vs["name"]
]
Hi, I'll look into it - but I cannot guarantee you that support for generic node object descriptors will be integrated.
The main reason for the actual nodetype constraints is that cdlib accommodates several graph representations (networkx, graph_tool, igraph, ad-hoc ones...).
Allowing generic objects as nodes makes it complicated to handle some cross-representation transformations (and could affect downstream visualization/evaluation tasks requiring printable node labels - we cannot assume that all objects will came with well defined and unique string reprs).
Moreover, using complex objects as nodes usually generate a relevant memory footprint: if such information is not needed during computation (rarely the case in CD) it is indeed better to keep it separated from the graph topology (and maybe use it for post-process analysis).
Thank you!
I understand that cdlib needs to work with diverse graph representations, etc, and I agree that working with complex objects can have adverse memory impact, but real world applications using cdlib will often already deal with complex node objects. As a library, cdlib will be more "ergonomic" if it can consume such objects as is.
The good thing is this should be relatively easy for cdlib to do, by taking advantage of the fact that node objects are hashable, by simply invoking id(node)
on the object, which results in integer node values. Applications can later "recover" the original node object via:
recover = { id(node): node for node in graph.nodes() }
I already see similar code in cdlib:
convert = { str(node): node for node in graph.nodes() }
which makes me suspect the approach I'm suggesting should be easy to apply.
Let me know what you think! I'm actually willing to help with this as well.
Describe the bug
When the input graph for the method
cdlib.utils.convert_graph_formats()
is a networkx graph where the node type is neither astring
not anint
(but is hashable as networkx expects), if the graph is bipartite, this method fails when converting to igraph format with the following error when attempting to populate thegi.vs["type"]
value:To Reproduce Steps to reproduce the behavior:
Here's a script (
test.py
) to repro the issue:Simply run with:
Expected behavior
I'd expect that call to convert_graph_formats() to work.
Screenshots NA
Additional context
I also have a fix that I'm workin on! I'll produce a PR shortly.