erlang / otp

Erlang/OTP
http://erlang.org
Apache License 2.0
11.39k stars 2.96k forks source link

digraph preorder and postorder traversals return wrong results #9040

Open lessness opened 6 days ago

lessness commented 6 days ago

Describe the bug digraph_utils:postorder/1 seems to return a wrong result, but it is consistent under a re-labelling of the vertices. digraph_utils:preorder/1 behaves differently on equivalent graphs, and results are wrong.

Very simple examples are given below.

To Reproduce

Two examples follows. The idea is:

Then the test is repeated for an identical tree G, the only difference being a different naming for the vertices. We should expect the same sequence of vertices, just renamed for both the tree traversal algorithms.

Part 1

Let's define the tree H below, and evaluate digraph_utils:postorder/1 and digraph_utils:preorder/1 for it:

graph_1

H = digraph:new([acyclic]).
digraph:add_vertex(H, a).
digraph:add_vertex(H, b).
digraph:add_vertex(H, c).
digraph:add_vertex(H, d).
digraph:add_vertex(H, e).

digraph:add_edge(H, a, b).
digraph:add_edge(H, a, c).
digraph:add_edge(H, b, d).
digraph:add_edge(H, b, e).

%% post order is not correct.
%% this call returns [e, d, c, b, a], but should be something 
%% as [e, d, b, c, a] (think of parsing an expression with "a" and "b" some binary operators).
%% "Something" means "accordingly to the convention of visiting left before right or viceversa"
digraph_utils:postorder(H).

%% pre order is wrong
%% this call returns [e, d, c, b, a], the same results as postorder, but in fact it should be [a, b, d, e, c], 
%% definitely not "e" before "a", for example.
digraph_utils:preorder(H).

Both calls here seem to return the in-order traversal path (or even a breadth-first search), but not what they are named after.

Part 2

Let's define the tree G below, and evaluate digraph_utils:postorder/1 and digraph_utils:preorder/1 for it. Please note that G is the same as H, with vertices re-labelled.

graph_2

G = digraph:new([acyclic]).
digraph:add_vertex(G, x).
digraph:add_vertex(G, f).
digraph:add_vertex(G, s1).
digraph:add_vertex(G, tt).
digraph:add_vertex(G, y).

digraph:add_edge(G, x, f).
digraph:add_edge(G, x, s1).
digraph:add_edge(G, f, tt).
digraph:add_edge(G, f, y).

%% post order is not correct, it returns [tt, y, s1, f, x], the same as before after re-labelling
%% a possible correct result should be [tt, y, f, s1, x]
digraph_utils:postorder(G).

%% pre order is wrong
%% this call returns [tt, y, x, s1, f], which **now is different** than postorder
%% it should be [x, f, tt, y, s1]
digraph_utils:preorder(G).

Expected behavior A call to digraph_utils:postorder/1 should:

  1. return the correct result

A call to digraph_utils:preorder/1 should:

  1. return the same structure for the two example trees above, H and G
  2. return a topologically sorted list of vertices
  3. return the correct result

Affected versions Erlang/OTP 27.1 Didn't check previous versions.

Additional context The suggested "correct results" in the examples above may be different accordingly to the precedence of vertex visiting. It may be useful to address the visiting precedence rule in the documentation, too. The issue just points out the inconsistency of digraph_utils:preorder/1 and digraph_utils:postorder/1.

lucioleKi commented 3 days ago

I looked into the code and documentation. I think the documentation describes what the two functions do accurately, but without reading the documentation, what the two functions do are counter-intuitive for people who know graphs. The functions should be fixed to give what users need.

For both functions, we run a DFS on the whole graph, and output all vertices in the order or reversed order of them being visited. It is not guaranteed to start from the root node of a tree. The start node can vary if vertices/edges are added in a different order, or with relabeling of the graph. If you are like me, who understand preorder for a tree as "visit the root, then recursively visit the left child, then recursively visit the right child", then the result from digraph_utils:preorder/1 looks wrong, because it starts with lesser guarantee than expected.

With the current digraph_utils:preorder/1, [e, d, c, b, a] is possible because the DFS can start from e, then restart from d and so on.

If you want a topologically sorted list of vertices, maybe digraph_utils:topsort/1 is closer to it. However, it has no guarantee to start from the 'root' node either, when given a tree.

The current code doesn't try to find a root node in any of the 3 functions. Since the starting node is randomly chosen, we will not break backwards compatibility if digraph_utils:preorder/1 and digraph_utils:postorder/1 now start from root nodes. I'll make a PR once I have a fix for this.

It might be useful if I add two functions digraph_utils:preorder/2 and digraph_utils:postorder/2, in which the second argument specifies the root of the tree. What do you think?

lessness commented 3 days ago

I see your point.

Let me just add a few more comments, or better, opinions, strongly biased in the user perspective:

Now, I understand not breaking backward compatibility in general. In this case, it sounds like nobody has noticed this issue before, or it is just me being too pedantic... I would break it, as the API does not tell the truth at least; and any user is either turning around it with tricks, or is in some "special" situation, I would guess. Being a core functionality in the OTP stdlib is different and somewhat more painful than being in "all the rest of the world outside the OTP".

But let's stick anyhow with the current implementation and function naming for a while, which I imagine is your preferred option; still I feel like the stdlib is missing the "classical" preorder and postorder functions.

Adding the digraph_utils:XXXorder/2 functions as you suggested, with the second argument being the "starting root" for traversal, sounds a bit "not enough Erlang" to me, as one would expect the digraph_utils:XXXorder/1 functions to be some special case with default value, not a non-predetermined, non-reproducible one. I would suggest either using something as weak or strong (the weak being the two current implementations, therefore the default; or any more descriptive name) as a second argument, and no choice on the starting node (no one complained about this until now); or else, introducing a new digraph_utils:traverse/2, with arguments the graph and an atom preorder or postorder (and maybe weak_preorder, weak_postorder if you are ok with deprecating the current digraph_utils:preorder/1, digraph_utils:postorder/1).

In any case, I am definitely favourable to be more descriptive in the documentation.

Thanks for your time.