Qiskit / rustworkx

A high performance Python graph library implemented in Rust.
https://www.rustworkx.org
Apache License 2.0
981 stars 142 forks source link

rx.digraph_find_cycle() does not find the cycle in a graph #1171

Closed weiT1993 closed 2 months ago

weiT1993 commented 2 months ago

Information

What is the current behavior?

rx.is_directed_acyclic_graph(graph) returns False but rx.digraph_find_cycle(graph) fails to find the cycle in the graph.

What is the expected behavior?

When rx.is_directed_acyclic_graph(graph) returns False, rx.digraph_find_cycle(graph) should find a cycle in the graph.

Steps to reproduce the problem

Running the following script several times. It quite often reproduces the bug. The graph is always correctly identified as not a DAG, but quite often rx.digraph_find_cycle(graph) does not find a cycle.

import rustworkx as rx

edges = [
    (6, 7),
    (8, 9),
    (15, 16),
    (0, 6),
    (2, 8),
    (9, 10),
    (7, 10),
    (10, 11),
    (11, 12),
    (10, 12),
    (12, 13),
    (12, 14),
    (4, 15),
    (16, 17),
    (14, 3),
    (13, 17),
    (17, 18),
    (18, 19),
    (17, 19),
    (19, 20),
    (19, 5),
    (20, 1),
    (27, 28),
    (28, 29),
    (34, 35),
    (21, 27),
    (23, 29),
    (29, 30),
    (30, 31),
    (29, 31),
    (31, 32),
    (31, 33),
    (25, 34),
    (32, 24),
    (35, 36),
    (33, 36),
    (36, 37),
    (37, 38),
    (36, 38),
    (38, 39),
    (38, 26),
    (39, 22),
    (42, 45),
    (44, 45),
    (40, 44),
    (45, 46),
    (46, 47),
    (45, 47),
    (47, 48),
    (47, 49),
    (48, 43),
    (49, 41),
    (56, 57),
    (57, 58),
    (61, 63),
    (50, 56),
    (52, 57),
    (58, 59),
    (57, 59),
    (59, 60),
    (54, 62),
    (59, 61),
    (60, 53),
    (62, 63),
    (63, 64),
    (64, 65),
    (63, 65),
    (65, 66),
    (65, 67),
    (66, 51),
    (67, 55),
    (74, 75),
    (75, 76),
    (80, 81),
    (68, 74),
    (70, 76),
    (76, 77),
    (77, 78),
    (76, 78),
    (78, 79),
    (78, 80),
    (72, 81),
    (79, 71),
    (81, 82),
    (82, 83),
    (81, 83),
    (83, 84),
    (83, 69),
    (84, 73),
    (1, 42),
    (3, 23),
    (24, 72),
    (22, 50),
    (5, 54),
    (41, 40),
    (26, 52),
    (69, 70),
]
edges = [x + (None,) for x in edges]
nodes = set()
for edge in edges:
    nodes.add(edge[0])
    nodes.add(edge[1])
nodes = list(nodes)

# Create a directed graph (PyDiGraph)
graph = rx.PyDiGraph()

# Add nodes and edges
graph.add_nodes_from(nodes)
graph.add_edges_from(edges)

# Check if the graph is a DAG
is_dag = rx.is_directed_acyclic_graph(graph)
print("Is the graph a DAG? {}".format(is_dag))

# Attempt to find a cycle
cycle = rx.digraph_find_cycle(graph)
print("Cycle found : {}".format(cycle))

Quite often the script produces:

Is the graph a DAG? False
Cycle found : EdgeList[]
IvanIsCoding commented 2 months ago

That sounds pretty bad. I haven’t reproduced the issue yet, but to give you a reasoning:

Until we fix it in 0.14.3 or 0.15.0 please use https://www.rustworkx.org/apiref/rustworkx.strongly_connected_components.html and pick a component with more than one node. That method is correct and should always give you a cycle. I will probably fix find_cycle using the SCC implementation

IvanIsCoding commented 2 months ago

I verified the problem. It is with the arbitrarily chosen starting node. If you pick this one (I plotted the graph to see the cycle):

cycle = rx.digraph_find_cycle(graph, graph.find_node_by_weight(81))
print(cycle)

You get:

EdgeList[(81, 82), (82, 83), (83, 69), (69, 70), (70, 76), (76, 77), (77, 78), (78, 80), (80, 81)]

I'll try to change the behaviour of rx.is_directed_acyclic_graph(graph). It should always return a cycle if it exists and node is specified. I agree it is a bit misleading. But probably the original intention was to pass a node you already knew was in a cycle as an argument

IvanIsCoding commented 2 months ago

Also, completely unrelated to the bug but you would probably like add_edges_from_no_data. You don't need to create the new edge list you created, just call graph.add_edges_from_no_data(edges) instead of graph.add_edges(edges) if you don't have weights.

weiT1993 commented 2 months ago

Thanks for looking into the bug. I am confused about your reasoning for the bug. The DFS cycle detection algorithm should find a cycle regardless of which arbitrary node was picked when the given graph is connected and only has one component. In my example, the given graph is a single connected graph.

If you are saying the DFS algorithm attempts to find a cycle that has to contain the random starting node if none was given, that would explain the bug. But it is a misleading design choice. For example, if the input is a huge graph with a tiny cycle, this algorithm would have a very small chance of returning the cycle unless the user already knows where the tiny cycle is (in which case, what's the point of running this function anyway?). Was this intentional?

I think the expected behavior for rx.digraph_find_cycle(graph) should be that it always finds a cycle (if there is one) when no starting node is specified. It should find a cycle containing the starting node when one is given.

And I do not think there is anything wrong with rx.is_directed_acyclic_graph(graph). It seems to correctly decide whether the graph is a DAG.

IvanIsCoding commented 2 months ago

Thanks for looking into the bug. I am confused about your reasoning for the bug. The DFS cycle detection algorithm should find a cycle regardless of which arbitrary node was picked when the given graph is connected and only has one component. In my example, the given graph is a single connected graph.

If you are saying the DFS algorithm attempts to find a cycle that has to contain the random starting node if none was given, that would explain the bug. But it is a misleading design choice. For example, if the input is a huge graph with a tiny cycle, this algorithm would have a very small chance of returning the cycle unless the user already knows where the tiny cycle is (in which case, what's the point of running this function anyway?). Was this intentional?

I think the expected behavior for rx.digraph_find_cycle(graph) should be that it always finds a cycle (if there is one) when no starting node is specified. It should find a cycle containing the starting node when one is given.

And I do not think there is anything wrong with rx.is_directed_acyclic_graph(graph). It seems to correctly decide whether the graph is a DAG.

In short, yes, it is a misnomer. A better name for the function would be digraph_find_cycle_containing_node.

Also, is-directed_acyclic_graph has an independent implementation that behaves as you expected.

weiT1993 commented 2 months ago

So currently, if I want to find a cycle in the graph without knowing which node is in it, I need to write a wrapper to make sure I visit all the nodes? Do you plan to support it?

IvanIsCoding commented 2 months ago

So currently, if I want to find a cycle in the graph without knowing which node is in it, I need to write a wrapper to make sure I visit all the nodes? Do you plan to support it?

We will fix it in 0.15 to make it smarter and always return a cycle if it exists.

If you need an urgent workaround I suggest checking the strongly connected components. Any component with more than one element is a cycle