constellation-app / constellation

A graph-focused data visualisation and interactive analysis application.
https://constellation-app.com
Apache License 2.0
386 stars 59 forks source link

Improve memory use of plugins and analytics #28

Closed arcturus2 closed 2 years ago

arcturus2 commented 5 years ago

Prerequisites

Description

With 2 GB allocated to Constellation and a 10k nodes and 10k transactions graph, trying to run the page rank analytic causes Constellation to crash with an out of memory error.

I don't mean to pick on this analytic, this is probably true for a lot of Data Access view plugins whereby we need to do a better job at making analytics and plugins in general more memory efficient.

Each feature we introduce should be profiled properly and basic memory issues and "bad practices" avoided.

Steps to Reproduce

  1. [First Step]
  2. [Second Step]
  3. [and so on...]

Expected behaviour: [What you expect to happen]

Actual behaviour: [What actually happens]

Reproduces how often: [What percentage of the time does it reproduce?]

Additional Information

This is the list of analytics and plugins to be reviewed (note there may be some that are missed that people are welcome to add to the list). People working on this ticket should mark these off as needed to ensure we know what still needs to be looked at:

Analytics

Other Algorithm Plugins

Analytic Schema Plugins

Arrangement Plugins

Data Access Plugins

Functionality Plugins

Graph Node Plugins

Import Export Plugins

Interactive Graph Plugins

Map View Plugins

Visual Graph Plugins

Visual Schema Plugins

github-actions[bot] commented 4 years ago

This issue is stale because it has been open for 90 days with no activity. Remove this stale label or comment or this will be closed in another 14 days

github-actions[bot] commented 4 years ago

This issue is stale because it has been open for 6 months with no activity. Consider reviewing and taking an action on this issue.

Nova-2119 commented 3 years ago

I have done a complete refactor of the page rank analytic so that there is less repetition of information stored and less calls to seemingly trivial (but actually fairly expensive) methods. Below are the profiles of both version when ran on a 500 by 500 node sphere graph with 100 iterations and an epsilon value of 0.

Original: image

New: image

The new version is therefore about 36 times quicker than the old version on a graph of this size. Previously when testing on a 30000 x 30000 sphere graph the analytic would run out of memory, now it runs with no problems.

antares1470 commented 3 years ago

I have done a complete refactor of the page rank analytic so that there is less repetition of information stored and less calls to seemingly trivial (but actually fairly expensive) methods. Below are the profiles of both version when ran on a 500 by 500 node sphere graph with 100 iterations and an epsilon value of 0.

Original: image

New: image

The new version is therefore about 36 times quicker than the old version on a graph of this size. Previously when testing on a 30000 x 30000 sphere graph the analytic would run out of memory, now it runs with no problems.

@Nova-2119 were those screenshots meant to be the other way around? It looks like the original is 3 secs faster than the new (I could be reading it wrong though).

Nova-2119 commented 3 years ago

Trying the analytics on a graph of 30K by 30K with default parameters to get a feel for where to focus: Betweeness Centrality: Quick with no issues, Closeness Centrality : Stopped before finished (too slow) didn't fail/run out of memory, Degree Centrality: Quick with no issues, Katz Centrality: Quick with no issues, Cosine Similarity: Failed out of memory Multiplexity, ratio of reciprocity, weight: Reasonably quick, no memory issue, Results not appearing in analytic view?

Given the above; the next step is trying to improve the cosine similarity analytic and then the closeness centrality.

Will raise a separate ticket for the results not appearing in the analytic view EDIT: Ticket created #895

Nova-2119 commented 3 years ago

So I started looking at Betweeness Centrality and noticed that the progress bar was disappearing long before the results were displayed. The profiler is showing that the analytic is spending much more time in the aggregator than in the actual BetweenessCentralityPlugin. As these aggregators affect multiple analytic plugins i'm going to switch my focus for this ticket to improving the performance of the aggregators for now.

Nova-2119 commented 3 years ago

Have made improvements to the aggregators speed. Previously the aggregator could take much longer than the actual analytics. Now how long it takes is unlikely to be a factor compared to how long the actual analytics take.

github-actions[bot] commented 2 years ago

This issue is stale because it has been open for 6 months with no activity. Consider reviewing and taking an action on this issue.

aldebaran30701 commented 2 years ago

Investigation into a suitable graph size and memory allocation should be performed. Then benchmarking for each plugin can be done and action accordingly. Varied graph sizes might be valuable for the analytics also.

antares1470 commented 2 years ago

Closing this due to large scope. A number of analytics have already had their memory use improved (including pagerank which was the one explicitly mentioned) and a long time could be spent looking through each plugin to determine how it could be more memory efficient. I think it would be better the just work on each plugin as the need to improve arises