Adds a performance dashboard

gampleman commented 3 years ago

aircloak-robot commented 3 years ago

Standard tests have passed 🎉

sebastian commented 3 years ago

Screen Shot 2020-11-27 at 13 50 29 Would be good to drop the decimal places here. Or at least limit them!

sebastian commented 3 years ago

In the activity monitor page we have given the query states friendly and readable names: Screen Shot 2020-11-27 at 13 52 39

I would like us to use the same names and variants in the performance overview: Screen Shot 2020-11-27 at 13 53 08

Screen Shot 2020-11-27 at 13 55 13

sebastian commented 3 years ago

The little graph on the far right is hard to read to the point of being mostly useless. The labels describing what the colors are is also placed under the main graph rather than the auxiliary graph that they belong to.

I would suggest:

moving the execution phase labels to the right so they are under the graph they belong to
add some information about median/average (average is probably better since you already use that in the graph) time per each of these phases and a %age of the whole they represent. Like "The average awaiting for data is 2 seconds and represents 40% of the query execution time"

Screen Shot 2020-11-27 at 13 56 34

sebastian commented 3 years ago

We are showing the slowest 99th %ile query. Why not show the actual slowest query? In this context that presumably makes equally much sense? So either replace 99th %ile with slowest query, or add the slowest query in addition?

Actually never mind: we already have the 10 slowest queries below 👍

sebastian commented 3 years ago

Right after changing the filters in the righthand filter menu the large graph updates, but the small one does now show the overall metrics breakdown, but is blank:

Screen Shot 2020-11-27 at 14 03 46

To reproduce:

filter by data source
then inspect the graph without having moused over the histogram

sebastian commented 3 years ago

Graph doesn't work on small screens:

Screen Shot 2020-11-27 at 14 07 05

Not immensely important, but if it's fixable "relatively easy", then great.

gampleman commented 3 years ago

Graph doesn't work on small screens:

Did you refresh? Currently the responsive bits are implemented on load only, which in practice is usually fine.

gampleman commented 3 years ago

The labels describing what the colors are is also placed under the main graph rather than the auxiliary graph that they belong to.

This is actually surprisingly hard to fix. I could of course code this dashboard properly in something like D3, which would be way more flexible, but I don't think there is quite enough time for that...

sebastian commented 3 years ago

Did you refresh? Currently the responsive bits are implemented on load only, which in practice is usually fine.

Refreshing worked. Thanks. Ok, consider that port resolved :)

sebastian commented 3 years ago

This is actually surprisingly hard to fix. I could of course code this dashboard properly in something like D3, which would be way more flexible, but I don't think there is quite enough time for that...

Ok, then let's do the least effort solution of at least:

fixing names
if possible show some indication of % of time alongside the label (likely not so easy...)

gampleman commented 3 years ago

@Sebastian I've addressed the feedback apart from slow query performance, which I'd like to tackle next, but in terms of making sure everything else is ready to go, it might be worth doing another round of review... Up to you.

aircloak-robot commented 3 years ago

Pull request can be merged 😊

gampleman commented 3 years ago

OK, so I've optimised the slowest query by about 3x (see before and after). It's still slower than you might want (for example on my entire query history the page still takes 1.3s to load which is ~7000 queries), but probably acceptable if you don't include huge date ranges.

Maybe a slight optimisation we could make is to change the default date range from 1 month to 1 week? There is a slight tradeoff since the metrics and visualizations included only really make sense if there is more than a hundred queries in the dataset.

sebastian commented 3 years ago

for example on my entire query history the page still takes 1.3s to load which is ~7000 queries

Our real systems have quite a bit more. Here are the number of queries from our demo and attack systems:

System	Num queries	if scales linear
Demo	378057	70s
Attack	11141695	35min

It would be interesting to deploy on demo and see how it actually works. If the assumed runtimes match the numbers in the table above then we can't keep this functionality in this form.

aircloak-robot commented 3 years ago

Standard tests have passed 👍

gampleman commented 3 years ago

Yes, but that's assuming you request all the data, which you probably shouldn't. However, we could limit the date range to some maximum. Or we could give up on calculating the percentile based queries (either altogether, or just if we know there are more than a certain number of queries in the resultset).

sebastian commented 3 years ago

Yes, but that's assuming you request all the data, which you probably shouldn't. However, we could limit the date range to some maximum. Or we could give up on calculating the percentile based queries (either altogether, or just if we know there are more than a certain number of queries in the resultset).

Or limit it to most recent Xk queries in the time period?

sebastian commented 3 years ago

Also I still think we should try to move it out of the first render, so the page shows faster and we can give the user an indication of something actually happening, rather than the server having crashed or their internet having broken.

gampleman commented 3 years ago

OK @sebastian I've changed the UI so that the percentile queries need to be explicitly loaded by the user. This means that the slow path is entirely avoidable by the user and the rest of the page should remain usable. I think this might be a nice way to expose this feature, as it's up to the user if they want to wait for the results....

aircloak-robot commented 3 years ago

Standard tests have passed ❤️

aircloak-robot commented 3 years ago

Pull request can be merged 🎉

Aircloak / aircloak

Adds a performance dashboard #4835