SACGF / variantgrid

VariantGrid public repo
Other
23 stars 2 forks source link

Analysis node profiling / performance monitoring #351

Open davmlaw opened 3 years ago

davmlaw commented 3 years ago

A notice was recently put up asking people to report bugs if nodes were slow.

Considering we record how long the nodes take to load, we could produce graphs / stats on this on the analysis issues page.

Eg how long the last X copies of a node type took to load as a boxplot, any outliers etc etc.

sksmi commented 3 years ago

Wondering whether it would also be worth decreasing the timeout? Would it help if we killed processes at 5 rather than 20 mins?
From a user perspective anything longer than 5mins is probably too slow to be practicable.

davmlaw commented 1 year ago

Some code to quickly show how long nodes took, would be useful to be able to see this easily without having to go node by node in debug tab.

from collections import defaultdict
from analysis.models import Analysis, AnalysisNode
analysis = Analysis.objects.get(pk=22533)
nodes = analysis.analysisnode_set.select_subclasses()
node_times = defaultdict(list)
for node in nodes.order_by("-load_seconds"):
    node_times[node.get_node_class_label()].append(node.load_seconds)

load_seconds is how long a query took to run, but there's also the config time (time to build query) and walltime

Would be good to be able to see this for an individual analysis and eg the last X analyses on the system

Would also be nice to be able to view how many nodes were loading simultaneously, so we can see how that affects performance - ie perhaps similar to VCF loading graphs