Generate all connected subgraphs of size k

Qiskit / rustworkx

A high performance Python graph library implemented in Rust.

https://www.rustworkx.org

Apache License 2.0

1.03k stars 145 forks source link

Generate all connected subgraphs of size k #1104

Closed sbrandhsn closed 6 months ago

sbrandhsn commented 6 months ago

This PR addresses #1068

Here, a method for generating all connected subgraphs of a certain size k is developed based on the work in "Enumerating Connected Induced Subgraphs: Improved Delay and Experimental Comparison". Christian Komusiewicz and Frank Sommer. Compared to the brute-force approach of drawing all combinations of k=3..6 nodes and then checking whether this combination is a connected subgraph, the method demonstrates a runtime speedup of two orders of magnitude as seen in the following figure. The evaluated graph is a heavy-hex graph of distance d=1, 3, 5, ..., 99.

connected_subgraphs

Raw data: raw_data.txt

coveralls commented 6 months ago

Pull Request Test Coverage Report for Build 8126957629

Details

63 of 66 (95.45%) changed or added relevant lines in 3 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage increased (+0.03%) to 96.513%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
src/connectivity/mod.rs	7	10	70.0%
<!--	Total:	63	66	95.45%	-->

Totals
Change from base Build 8117144686:	0.03%
Covered Lines:	16883
Relevant Lines:	17493

💛 - Coveralls

IvanIsCoding commented 6 months ago

Awesome! Looking forward to review this

sbrandhsn commented 6 months ago

Awesome! Looking forward to review this

Thanks for reviewing this - this is my first rustworkx and also rust contribution so let me know if you have general remarks! :-)

IvanIsCoding commented 6 months ago

@sbrandhsn I have thought more about the iterative version and I think it is achievable. Here is the pseudocode for it in Python:

def simple_iterative(p, x):
    ans = []
    # Third value indicates is_leaf
    # None -> not processed
    # False -> child returned false
    # True -> child returned tru
    stack = [(p, x, None)] # third value is is_leaf
    while len(stack) > 0:
        p, xnext, is_leaf = stack[-1]

        # Handle P is equal to K case
        if len(p) == k:
            stack[-2][2] = True # update is_leaf
            ans.push(p)
            stack.pop()
            continue

        if is_leaf is not None and not is_leaf:
            if len(stack) > 1:
                stack[-2][2] = is_leaf or stack[-2][2]
            stack.pop()
            continue

        # Handle X is empty case
        if len(xnext) == 0:
            if len(stack) > 1:
                stack[-2][2] = is_leaf or stack[-2][2]
            stack.pop()
            continue

        u = any_vertex(xnext)
        xnext.remove(u)
        xprime = union(xnext, difference(N(u), N(p)))
        stack.push((union(p, [u]), xprime))

    return ans

I hope that helps, I know the original paper only gave a recursive version

sbrandhsn commented 6 months ago

Thanks for your comments! :-) I wasn't aware of while let and will change the code accordingly. Regarding recursive vs iterative function. I can see your point in general. However, in this case the stack depth is bound by k and the hash set x_next is expected to be allocated on the heap, right?

For the five 64 bit arguments and a stack size of 1 MB, the stack depth (and k) can be up to 25000 before we run into a stack overflow... For such large k only trivial subgraphs can be expected to be returned in a reasonable time. Do you think it makes sense to emit a warning in cases where connected_subgraphs would have an excessive runtime, i.e. for large k or graphs with a high degree?

Lemma 4 in https://www.uni-marburg.de/de/fb12/arbeitsgruppen/algorith/paper/enucon.pdf actually introduces an iterative version. I will look into that version to see if I can implement it in reasonable time (as it improves the runtime asymptotically) and otherwise use your pseudo code. :-)

IvanIsCoding commented 6 months ago

For the five 64 bit arguments and a stack size of 1 MB, the stack depth (and k) can be up to 25000 before we run into a stack overflow... For such large k only trivial subgraphs can be expected to be returned in a reasonable time. Do you think it makes sense to emit a warning in cases where connected_subgraphs would have an excessive runtime, i.e. for large k or graphs with a high degree?

I don't think we need to emit a warning. At first before rewriting the algorithm wasn't obvious that the bound in the recursion depth is the number of nodes of the graph. If the bound was the number of subgraphs though reaching 25000 is trivial so I had to ask.

I think for the first version the recursive version is ok. The iterative version would be slightly faster and more consistent with most of the code we had. But the final version will hopefully something that we can parallelize (and it does not need to come in this version!)