apinf / emq-analytics-dashboard

Dashboard for EMQ analytics
MIT License
1 stars 1 forks source link

Feature/aggregations #7

Closed frenchbread closed 7 years ago

frenchbread commented 7 years ago

Changes

How it looks

screen shot 2017-03-16 at 14 55 15

How fast it loads

TODOs

Closes #6

frenchbread commented 7 years ago

Please 👀

bajiat commented 7 years ago

@Nazarah Can you comment from the UX point of view?

brylie commented 7 years ago

@frenchbread can you give us some clearly defined metrics here? E.g. how many records is the dashboard loading? How much time (in milliseconds) do the following stages take (basically, what does the Chrome timeline contain)?

brylie commented 7 years ago

Also, what do the performance metrics look like with 10x records? 100x? A rough estimate is fine, so we can get an idea of how this would scale.

frenchbread commented 7 years ago

Server side

The server side request to ES takes around 43-63 milliseconds.

screen shot 2017-03-07 at 13 03 31

Client side

On client side, the total page load takes around 2-2.5 seconds, and the actual rendering around 300 milliseconds.

screen shot 2017-03-07 at 13 16 59

brylie commented 7 years ago

Excellent, and how many records are you retrieving? That way we can figure out the relationship between the number of records and performance time.

frenchbread commented 7 years ago

As of now the dataset that goes to the charts looks like this:

screen shot 2017-03-07 at 13 18 50

So it does not really matter how much items are there since they are not grouped "manually" (by dc.js) but are pre-aggregated. The only load I can think of can come from selecting the maximum possible date range, e.g. years, decades? In that case the list of records will grow. Right now, by default, around 30 days (depending on a month) are shown.

I can go ahead & populate ES with more sample data, say for last 2+ years, and compare the results.

brylie commented 7 years ago

Cool, yes. Keep in mind that the aggregation step takes time too.

screenshot_20170307_132924

frenchbread commented 7 years ago

@brylie

Keep in mind that the aggregation step takes time too.

This is what "server-side" metric measures.

Here is a range for all the data in the ES (1st Jan - 3rd Mar): 3069 items total. And it also takes from 40-60 ms to request data.

[{"key":"MQT connections over time","values":[{"x":1483308000000,"y":33},{"x":1483394400000,"y":107},{"x":1483480800000,"y":116},{"x":1483567200000,"y":46},{"x":1483653600000,"y":35},{"x":1483740000000,"y":72},{"x":1483826400000,"y":88},{"x":1483912800000,"y":68},{"x":1483999200000,"y":32},{"x":1484085600000,"y":48},{"x":1484172000000,"y":115},{"x":1484258400000,"y":47},{"x":1484344800000,"y":88},{"x":1484431200000,"y":20},{"x":1484517600000,"y":108},{"x":1484604000000,"y":41},{"x":1484690400000,"y":52},{"x":1484776800000,"y":101},{"x":1484863200000,"y":60},{"x":1484949600000,"y":96},{"x":1485036000000,"y":65},{"x":1485122400000,"y":42},{"x":1485208800000,"y":118},{"x":1485295200000,"y":106},{"x":1485381600000,"y":50},{"x":1485468000000,"y":92},{"x":1485554400000,"y":101},{"x":1485640800000,"y":93},{"x":1485727200000,"y":94},{"x":1485813600000,"y":43},{"x":1485900000000,"y":116},{"x":1485986400000,"y":24},{"x":1486072800000,"y":105},{"x":1486159200000,"y":80},{"x":1486245600000,"y":29},{"x":1486332000000,"y":72},{"x":1486418400000,"y":52},{"x":1486504800000,"y":116},{"x":1486591200000,"y":40},{"x":1486677600000,"y":67},{"x":1486764000000,"y":27},{"x":1486850400000,"y":86},{"x":1486936800000,"y":67},{"x":1487023200000,"y":0},{"x":1487109600000,"y":0},{"x":1487196000000,"y":0},{"x":1487282400000,"y":0},{"x":1487368800000,"y":0},{"x":1487455200000,"y":0},{"x":1487541600000,"y":0},{"x":1487628000000,"y":0},{"x":1487714400000,"y":0},{"x":1487800800000,"y":0},{"x":1487887200000,"y":0},{"x":1487973600000,"y":0},{"x":1488060000000,"y":0},{"x":1488146400000,"y":0},{"x":1488232800000,"y":0},{"x":1488319200000,"y":0},{"x":1488405600000,"y":0},{"x":1488492000000,"y":11}]}]
brylie commented 7 years ago

Here is a revised diagram:

screenshot_20170307_133407

From your understanding, what is the point I am trying to make here?

brylie commented 7 years ago

Lets conduct a couple more server-side aggregation experiments, as mentioned above:

That way we can get more insight into how the aggregation step might vary as a function of the number of records.

frenchbread commented 7 years ago

From your understanding, what is the point I am trying to make here?

@brylie Isn't it "measuring time aggregation takes"? Since aggregation is done on elasticsearch side, search execution metric is exposed in the response object (3 milliseconds in this case):

screen shot 2017-03-07 at 14 47 15

frenchbread commented 7 years ago

Update

I've modified the server code and moved ElasticSearch client initialisation out of the Meteor method, this way it saves around 20 milliseconds more.

brylie commented 7 years ago

Since aggregation is done on elasticsearch side, search execution metric is exposed in the response object

Thanks for the example. 3ms aggregation with 3,000 records is quite fast. Can we try a couple of experiments with larger data, say 10,000 or 50,000 record aggregation?

frenchbread commented 7 years ago

@brylie I'm on it. 🚀

frenchbread commented 7 years ago

Generated sample data & tested request/aggregation time, collected metrics & saved to docs/request-aggregation-metrics.csv

Results

As seen in the CSV file, growing amount of analytics data for fixed amount of days (in this case 66 days ~ 2 months) does not slow down request/aggregation time. In fact, referring to numbers speed is even increasing:

Items Server -> ES request (ms) ES search execution (ms)
13k 42.9 3.9
20k 30 3.7
50k 28.7 2.3
frenchbread commented 7 years ago

As of now, for this PR provided metrics is enough I suppose. I've created related issue #8 for testing within wider date ranges (larger amount of days).

Nazarah commented 7 years ago

@frenchbread : great work done. I am really liking the simple look of chart here. A few improvement suggestions:

A few queries

To me the client side monitoring visualization graph is looking more attention grabbing and interesting https://cloud.githubusercontent.com/assets/2122679/23654223/5d310890-0338-11e7-8aec-5ba59a9ca50e.png Can we somehow incorporate this sort of visualizations with EMQTT?

frenchbread commented 7 years ago

@Nazarah Thanks for you comments. I've updated def. of done.

Are we considering any hourly/daily/weekly visualization of data here?

Yes, it is on the way.

From EMQTT schema, what metrics we wold be considering to show in visualization as part of API usage monitoring?

That is still not decided.

Can we somehow incorporate this sort of visualizations with EMQTT?

That's good idea. What exactly are you referring to?

We could actually divide the current (general overview) chart that we have into multiple smaller ones where each chart would represent usage for specific log type (e.g. on_client_connected, on_client_subscribed etc.) and color them differently.

How does that sound? @bajiat @brylie @Nazarah

brylie commented 7 years ago

From EMQTT schema, what metrics we wold be considering to show in visualization as part of API usage monitoring?

@frenchbread can you give us some examples of the types of metrics we might expect? E.g. the name of one or more metric(s) and the data type(s). That way we can at least start sketching some ideas.

frenchbread commented 7 years ago

Here they are https://github.com/apinf/emq-analytics-dashboard/blob/master/dashboard/DATA.md#type-string-possible-options

frenchbread commented 7 years ago

Added granularity filter:

By month

screen shot 2017-03-09 at 14 33 51

By week

screen shot 2017-03-09 at 14 33 57

By day

screen shot 2017-03-09 at 14 34 06

By hour

screen shot 2017-03-09 at 14 34 13

frenchbread commented 7 years ago

This is what I meant by

We could actually devide the current (general overview) chart that we have into multiple smaller ones where each chart would represent usage for specific log type (e.g. on_client_connected, on_client_subscribed etc.) and color them differently.

screen shot 2017-03-09 at 16 40 55

Are we going to keep this (& I commit changes here)?

frenchbread commented 7 years ago

There are some drawbacks here:

In other cases works fine

brylie commented 7 years ago

Why does the dashboard freeze with large date ranges?

frenchbread commented 7 years ago

I assume it happens due rendering data for large amount of days (> 600 days).

frenchbread commented 7 years ago

Idea: we can create use cases for filter queries when dashboard freezes, hide chart and show placeholder message e.g. "Too much data to show" or "Select smaller dates range"..

brylie commented 7 years ago

Another idea is to have 'default' granularity settings. E.g. when selecting two years span, defaulting the query to monthly (or daily).

frenchbread commented 7 years ago

Removed grid lines & changed datepicker to pikaday library.

TODO:

brylie commented 7 years ago

Is this ready for review, or will it be under construction a bit longer?

frenchbread commented 7 years ago

@brylie This is not yet ready.

frenchbread commented 7 years ago

Moved filtering by message type to its own select-picker.

screen shot 2017-03-14 at 16 26 33

Please review.

brylie commented 7 years ago

Woohoo! 😀

frenchbread commented 7 years ago

@brylie Could you merge this? If you are going to test it, I can send you ES host with sample data.

brylie commented 7 years ago

This looks really good from a UI perspective! I made some suggestions on how to improve the code style, so our code conforms to our developer expectations.

phanimahesh commented 7 years ago

Was this checked with live ES data? If not, @frenchbread let's do a quick check with live data once. First thing tomorrow morning, unless you are occupied. We exchanged docs on event format but still.

frenchbread commented 7 years ago

Added suggested changes. Please review

frenchbread commented 7 years ago

After new elasticsearch plugin for emq is deployed, it would require to refactor a little bit in order to fit updated analytics logs schema. But this could be done in separate PR

frenchbread commented 7 years ago

@phanimahesh As we discussed in the chat, yes, dashboard has been tested with live/real ES data. For purposes of testing (with wider date-ranges & collecting metrics of loading/rendering) some dummy data was added in addition to 'real' generated by the plugin.

frenchbread commented 7 years ago

Added comments

frenchbread commented 7 years ago

Removed second meteor method. Please :mag_right: