Max-flow alg for unorderedElemsAre , containsAll, containsOnly

cdsmith commented 3 years ago

unorderedElemsAre , containsAll, containsOnly should use a max-flow algorithm to match arguments, and then explain which elements and predicates are matched or unmatched in max flow.

The max flow should be through a graph like the following:

   Src   P    A   Snk

         o -- o
        / \  / \
       /   \/   \
      /    /\    \
     /    /  \    \
--> o -- o -- o -- o -->
     \    \  /    /
      \    \/    /
       \   /\   /
        \ /  \ /
         o -- o

All edges have capacity 1. There is a vertex A_i for each element of the collection being tested, and a vertex P_i for each predicate in the list. There is an edge from P_i to A_i whenever the element A_i matches the predicate P_i. There are two special vertices Src with an edge to every P_i, and Snk with an edge from every vertex A_i. The resulting max flow represents a best matching of predicates to elements. When the matching succeeds, the total flow will be equal to the number of elements (containsOnly), number of predicates (containsAll), or both (unorderedElemsAre). If the flow is less than that, there will be unused edges from Src and/or to Snk that identify the unmatched elements and/or predicates, and these can be explained in the result.

cdsmith commented 3 years ago

See https://en.wikipedia.org/wiki/Maximum_flow_problem#Maximum_cardinality_bipartite_matching

cdsmith commented 3 years ago

See also https://arxiv.org/abs/1105.1569, which looks promising for large instances because it has a complexity better than the number of edges, so (unlike most other choices) it doesn't look at all edges. Looking at all edges is potentially expensive here because it means matching arbitrary predicates, which themselves can be expensive, a quadratic number of times. The linked paper is still quadratic in the size of the smaller set, but necessarily doesn't require matching all predicates against all elements.

cdsmith commented 3 years ago

Played around with bipartite matching a bit. An unfinished and completely untested implementation of bipartite matching is at https://code.world/haskell#PmZZ5KPA57tC_hxHxYhFb6Q. This is using Ford-Fulkerson rather than something fancier, but that would be plenty good enough given that there are likely to be only dozens of elements in test cases.

cdsmith commented 3 years ago

Also of note: https://hackage.haskell.org/package/fgl-5.7.0.3/docs/Data-Graph-Inductive-Query-MaxFlow.html Probably more expensive to construct the graph explicitly, and it's overly general since the algorithm expects arbitrary paths rather than bipartite matching. But it's already implemented and correct.

cdsmith / HMock

Max-flow alg for unorderedElemsAre , containsAll, containsOnly #17