anras5 / hasse-diagram

Plot Finite Partially Ordered Sets in Python
https://pypi.org/project/hasse-diagram/
3 stars 0 forks source link

Code hangs: Memory leak #5

Closed LordRatte closed 1 month ago

LordRatte commented 1 month ago

This code hangs. Could there be an infinite loop somewhere? (Edit: eventually the memory usage grows until the OS kills the process)

from hassediagram import plot_hasse
import numpy as np
matrix = np.array([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                    1, 1, 1],
                   [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                    1, 1, 1],
                   [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                    1, 1, 1],
                   [0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
                    1, 1, 1],
                   [0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
                    0, 0, 0],
                   [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                    1, 1, 1],
                   [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                    1, 1, 1],
                   [0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
                    1, 1, 1],
                   [0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
                    1, 1, 1],
                   [0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
                    1, 1, 1],
                   [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                    1, 1, 1],
                   [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                    1, 1, 1],
                   [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                    1, 1, 1],
                   [0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
                    1, 1, 1],
                   [0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
                    1, 1, 1],
                   [0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
                    1, 1, 1],
                   [0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
                    1, 1, 1],
                   [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                    1, 1, 1],
                   [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                    1, 1, 1],
                   [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                    1, 1, 1],
                   [0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
                    1, 1, 1],
                   [0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
                    1, 1, 1],
                   [0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
                    1, 1, 1],
                   [0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
                    1, 1, 1],
                   [0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1,
                    1, 1, 1]])

keys = list(map(str, range(25)))

plot_hasse(matrix, keys)

When I send an interrupt, this is what I get. Perhaps it will help:

Traceback (most recent call last):
  File "/Users/user/sources/project_____________/hasse.py", line 68, in <module>
    plot_hasse(matrix, keys)
  File "/Users/user/sources/project_____________/venv/lib/python3.12/site-packages/hassediagram/hasse_diagram.py", line 67, in plot_hasse
    data, labels_dict, ranks = _hasse_matrix(data, labels, transitive_reduction)
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/sources/project_____________/venv/lib/python3.12/site-packages/hassediagram/hasse_diagram.py", line 39, in _hasse_matrix
    data, labels_dict = _find_and_merge_cycles(data, labels_dict)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/sources/project_____________/venv/lib/python3.12/site-packages/hassediagram/cycles.py", line 25, in _find_and_merge_cycles
    all_cycles = list(nx.simple_cycles(G))
                 ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/user/sources/project_____________/venv/lib/python3.12/site-packages/networkx/algorithms/cycles.py", line 235, in simple_cycles
    yield from _directed_cycle_search(G, length_bound)
  File "/Users/user/sources/project_____________/venv/lib/python3.12/site-packages/networkx/algorithms/cycles.py", line 282, in _directed_cycle_search
    yield from _johnson_cycle_search(Gc, [v])
  File "/Users/user/sources/project_____________/venv/lib/python3.12/site-packages/networkx/algorithms/cycles.py", line 406, in _johnson_cycle_search
    u = unblock_stack.pop()
        ^^^^^^^^^^^^^^^^^^^
KeyboardInterrupt
LordRatte commented 1 month ago

I can verify that it works with a smaller example.

anras5 commented 1 month ago

Hello,

Thank you for your interest in the package! I ran the code you provided and did not encounter any errors, although the processing time for the graph was somewhat lengthy.

> time python3 src/hassediagram/hasse_diagram.py
python3 src/hassediagram/hasse_diagram.py  372.82s user 17.94s system 89% cpu 7:17.84 total

Currently, optimizing the runtime further is challenging. Reducing cycles in a large, highly connected graph is complex, which is why I chose to use the networkx library. Specifically, I decided to use networkx.algorithms.cycles.simple_cycles, as it’s already optimized by the creators of the library.

There are some known issues related to this function that might be relevant: MemoryError with networkx simple_cycles function.

Please see the output for your graph below: Figure_1

LordRatte commented 1 month ago

Thanks for your response.

What does your memory usage look like when you run it? My process caps out at 11GB. It would be helpful to know whether it is working for you because of more system memory or something else.

anras5 commented 1 month ago

It does look like the memory usage is a big part of the issue here. On my machine the process uses around 20-21GB of RAM, which is... quite a lot. However, the function is designed to find all cycles in the graph, so it makes sense (?) that it would require this much memory.

Here is a screenshot from activity monitor:

image
LordRatte commented 1 month ago

I think that sounds like a reasonable guess.

I have fixed it by excluding cycles when generating the matrix. They shouldn't have been there in the first place, I suppose.

Maybe it would be a good idea to require people to explicitly allow cycles to be removed otherwise throw an error. A Hasse diagram is for posets after all (i.e. antisymetric), so in this situation the mistake was actually mine and shouldn't have "passed" "silently".

I can give it a stab in a PR if you'd like.

Either way, great library. Take care.

anras5 commented 1 month ago

The main idea behind the find and merge cycles function is that I mostly use this package for plotting and visualizing results for decision support algorithms like UTA, ELECTRE, or PROMETHEE. These algorithms create adjacency matrices where edges indicate an “at least as good as” relationship between alternatives, which is essential for the final result. The library then merges the alternatives connected by a cycle and displays them as a single node in the graph.

Maybe it’s not necessary to find all cycles in the graph for these methods. For instance, if A>=B & B>=A & C>=A & A>=C, then we know that B>=C & C>=B and we don't need to find cycles A-B-C or even B-C, while the networkx function will do it. So, the library could just focus on cycles of length 2 in this case. However, for now, I’d prefer to keep it as it is, since this approach is more flexible and isn’t limited to just the problems I use it for.