Stop storing derived metrics

axw commented 4 years ago

We can derive them in Elasticsearch instead; delete the "numbers" package, use inline arithmetic.

Also, fixed a bug with the "total events sent - success %" metric reported to stdout, which mistakenly used an errors count instead of events count.

apmmachine commented 4 years ago

:green_heart: Build Succeeded

Expand to view the summary

#### Build stats * Build Cause: `[Pull request #183 opened]` * Start Time: 2020-07-30T05:44:09.738+0000 * Duration: 4 min 44 sec #### Test stats :test_tube: | Test | Results | | ------------ | :-----------------------------: | | Failed | 0 | | Passed | 9 | | Skipped | 0 | | Total | 9 |

jalvz commented 4 years ago

Thinking it trough, I think I disagree with this change. It is true we can get all that from Elasticsearch, it was actually considered initially. But given the (imo negligible) complexity of indexing those fields and storage costs, it pays off at query time. A trivial _search query would give me all the data I want without having to craft those calculations in a query body (accounting for optional values, zero divisions, etc). Now if I need to dig beyond dashboards I need to write arbitrarily complicated queries instead.

simitt commented 4 years ago

I might not be the heaviest user of the hey-apm benchmarking, but I have barely used the derived metrices; if so I had to look up how they were calculated as it wasn't necessarily clear to me what the difference/timeunit etc for the metrics were.

@jalvz could you share where/how you were using these values?

jalvz commented 4 years ago

Any time I queried the actual docs instead of using the dashboards. That is every time I want granular data or I am looking for something specific. Last time was during the recent spike caused by enabling the instrumentation

axw commented 4 years ago

A trivial _search query would give me all the data I want without having to craft those calculations in a query body (accounting for optional values, zero divisions, etc). ... Any time I queried the actual docs instead of using the dashboards. That is every time I want granular data or I am looking for something specific. Last time was during the recent spike caused by enabling the instrumentation

I've been thinking of these metrics as only useful for comparisons, rather than looking at in isolation. Can you describe how you were using them in isolation? Or if you were using them for comparison, where the dashboards fell short and looking at raw docs helped.

Just trying to understand why we can't use dashboards + kuery bar.

jalvz commented 4 years ago

how you were using them in isolation?

If I have a single run from a PR I can't just use the dashboards because results are averaged. So I'd query individual fields from an individual document (I grab a reportId from the log to filter by) and then compare it with the latest average from the related dashboard.

why we can't use dashboards + kuery bar

I think we can do that the same now as before this change. The difference is that now we can't query individual documents (or rather we can, but it is quite more difficult)

I also don't understand what specific problem is this solving, besides doing the same thing in a different way (and with less lines of code, but lines of code are not a problem per se)

axw commented 4 years ago

If I have a single run from a PR I can't just use the dashboards because results are averaged. ... I think we can do that the same now as before this change. The difference is that now we can't query individual documents (or rather we can, but it is quite more difficult)

We can't do it the exact same way (which implies no change at all), but we can do this with a data table visualisation and scripted fields (until Lens supports bucket_script - https://github.com/elastic/kibana/issues/67405). I've added one to the dashboard, let me know if this does what you need or what's missing.

I also don't understand what specific problem is this solving, besides doing the same thing in a different way (and with less lines of code, but lines of code are not a problem per se)

There were a couple of things that set this off:

Non-idiomatic code, e.g. func Div(i1, i2 interface{}) *float64, and unnecessary code, e.g. func Sum(xs ...uint64) uint64
Many metrics with no clear value, making it more difficult (for me) to identify the ones that provide the greatest signal

If I knew you were using the derived metrics at the individual report level (code comments explaining why we're not using ES aggs would have helped) then I wouldn't have gone and removed them; I would have just cleaned up the way they were calculated.

Anyway, as above, please let me know what's missing. We can add them back in if really needed.

jalvz commented 4 years ago

we can do this with a data table visualisation and scripted fields

I think that will work - and if not we will figure something else out, thanks

elastic / hey-apm

Stop storing derived metrics #183

:green_heart: Build Succeeded