bitwalker / libgraph

A graph data structure library for Elixir projects
MIT License
527 stars 75 forks source link

Duplicated edges from Graph.edges(g, v) when v has a self-reference? #65

Open dsschneidermann opened 1 year ago

dsschneidermann commented 1 year ago

Hi and first of all, thanks for maintaining this library. It has turned out to be essential to me since I started using it.

I'm a big fan of pattern matching over edges in the graph and it's a very intuitive way to write complicated case handling. However I just discovered that I'm getting duplicated edges when I am matching on, eg.

case graph |> Graph.edges(metric_key) do
    %{v1: ^metric_key, v2: ^metric_key} -> ...
    ... other cases
end

From a glance at the code, a possible cause could be if a self-referencing edge could be described by both v_in and v_out: https://github.com/bitwalker/libgraph/blob/15ff0b9ba8c22a9dfec5bc04096fb7025e58f34b/lib/graph.ex#L451

But the description would have to differ, otherwise the MapSet would make it unique.. Do you think this is a correct cause and if so, is it an intended one?

I don't suppose there is any inherent disadvantage in just matching instead on all edges as such:

case graph |> Graph.edges() do
    %{v1: ^metric_key, v2: ^metric_key} -> ...
    ... other cases
    _ -> [] # not the stuff we're interested in
end |> List.flatten()

Looking at the code, it seems to be pretty much the same amount of work being done -- and in my situation, I always need to flat_map anyway in what I am doing.

However, someone else may be tripped up on it and have a situation where getting duplicates could cause confusion down the line.

Best regards, Dennis

stevensonmt commented 6 months ago

Can you give an example of this issue? I am not sure that I understand your explanation of the issue. That said, I think any duplicate edge representation is caused not by the code you referenced but by this portion of the code:

    e_in =
      Enum.flat_map(v_all, fn v2_id ->
        case Map.get(meta, {v2_id, v_id}) do
          nil ->
            []

          edge_meta when is_map(edge_meta) ->
            v2 = Map.get(vs, v2_id)

            for {label, weight} <- edge_meta do
              Edge.new(v2, v, label: label, weight: weight)
            end
        end
      end)

    e_out =
      Enum.flat_map(v_all, fn v2_id ->
        case Map.get(meta, {v_id, v2_id}) do
          nil ->
            []

          edge_meta when is_map(edge_meta) ->
            v2 = Map.get(vs, v2_id)

            for {label, weight} <- edge_meta do
              Edge.new(v, v2, label: label, weight: weight)
            end
        end
      end)

    e_in ++ e_out

Concatenating the two lists allows for duplicate entries. This makes sense as intended behavior for a directed graph but maybe not for an undirected graph. I suspect if we wanted to avoid duplicates the solution is something like:

  Enum.reduce(v_all, MapSet.new(), fn v2_id, es ->  get_edges(v2_id, v_id, meta) |> MapSet.union(es) end) |> Enum.to_list()
...

defp get_edges(v2_id, v_id, meta) do 
  [{v2_id, v_id}, {v_id, v2_id}]
  |> Enum.reduce(MapSet.new(), fn {vi_id, vo_id} = vs, edges ->
  case Map.get(meta, vs) do
          nil ->
            edges

          edge_meta when is_map(edge_meta) ->
            v = Map.get(vs, vi_id)
           v2 = Map.get(vs, vo_id)

            for {label, weight} <- edge_meta do
              Edge.new(v, v2, label: label, weight: weight)
            end
            |> MapSet.new()
            |> MapSet.union(edges)
        end
    end
  end

Haven't tested the above just sort of thinking out loud. Can put a PR together if this approach seems to address a real problem, but I'm not sure that it does for a directed graph.

dsschneidermann commented 6 months ago

Hello, yes sorry I can see it is not very clear :)

Here is a reproduction:

defmodule GraphingTest do
  use ExUnit.Case, async: true

  test "graphlib" do
    {graph, edges} =
      for metric_key <- ["a", "b"], reduce: {Graph.new(), []} do
        acc ->
          input_edges =
            for input <- ["a"] do
              {input, metric_key}
            end

          {elem(acc, 0) |> Graph.add_vertex(metric_key), input_edges ++ elem(acc, 1)}
      end

    IO.puts("expected: ")

    graph =
      for {v1, v2} <- edges, reduce: graph do
        acc ->
          {v1, v2} |> inspect |> IO.puts()
          Graph.add_edge(acc, %Graph.Edge{v1: v1, v2: v2})
      end

    IO.puts("actual: ")

    for edge <- graph |> Graph.edges("a") do
      case edge do
        %{v1: v1, v2: v2} -> {v1, v2} |> inspect |> IO.puts()
      end
    end
  end
end

Output:

expected: 
{"a", "b"}
{"a", "a"}
actual: 
{"a", "a"}
{"a", "a"}
{"a", "b"}

Double "a"->"a" is unexpected here. I ended up matching on the source information instead of using the graph, but I can tell from small testing that it only happens when asking specifically for Graph.edges("a") edges - not when getting all Graph.edges()

stevensonmt commented 6 months ago

I think that is the expected behavior given docs for Graph.edges/2:

Returns a list of all edges inbound or outbound from vertex v.

For an edge with a vertex incident upon itself you would have the duplication, whereas with Graph.edges/1 it is only considering the out-edges and not yielding the duplicate. I still think this makes sense for directed graphs since you need to consider a -> a as both directions being valid.

dsschneidermann commented 6 months ago

I have to disagree with the expectation here. It may be technically correct according to the wording of the docs, but it's a really weird edge case (hah) to actually support. What's the beneficial use of having the same edge returned twice? There simply isn't two edges in a directed graph A -> A, or in an undirected graph A - A.

Ultimately it is your choice to make, but I would definitely ask it flagged it in the docs specifically. IMO any reading of the docs will always assume that Graph.edges/1 returns a subset of Graph.edges/0.

Also, for what its worth, I definitely understand any hesitation on your part to add any fixes as a breaking change, but maybe it could be scheduled as a future change.

stevensonmt commented 6 months ago

Just to be clear, I'm not the maintainer, just an interested user who is happy to contribute pull requests for issues. I think this is an easy enough fix that would not be a breaking change. If there is agreement that A -> A is always considered a single edge I can try to work on the fix. My real job is a bit crazy at the moment, so I probably wouldn't have time until after 2-3 weeks.