chaoss / grimoirelab-kibiter

Soft fork of Kibana, for the benefit of GrimoireLab
https://chaoss.github.io/grimoirelab
Other
24 stars 17 forks source link

'Others' bucket does not apply the selected metric #120

Closed alpgarcia closed 4 years ago

alpgarcia commented 4 years ago

Kibiter version: 6.1.4-1 Browser: Chrome 78.0.3904.97 (Build oficial) (64 bits) Related (somehow) to: #107

Others bucket is showing the number of documents that aren't included in any bucket. That makes the percentage to be wrong when using unique count as metric, as Others doesn't apply any metric but just a simple document count.

Steps to reproduce:

  1. Create a Pie Chart.
  2. Add a unique count of something as metric.
  3. Add a terms bucket by some field. Set size to some small value (5 or 10 should work).
  4. Look at the percentage of the first slice (and the result of its unique count).
  5. Modify size to include all possible buckets or at least some value considerably higher than the previous one.
  6. Look at the same slice as in step 4, unique count shoud be the same, percentage should have changed.

Some screenshots of this behavior (steps 4 and 6 respectively): others_bug_1 others_bug_2

jsmanrique commented 4 years ago

It seems to be a Kibana issue, doesn't it?

alpgarcia commented 4 years ago

Nope, this was an old feature coming from long time ago. It was added by @dlumbrer but originally introduced by @acs as far as I know: https://github.com/chaoss/grimoirelab-kibiter/pull/51 https://github.com/chaoss/grimoirelab-kibiter/commit/99e311d4d5d862a498a2fca80af951ad7c8cc3d2 https://github.com/chaoss/grimoirelab-kibiter/commit/d41d3bd676545af9fb1a82977d34214bb1579aab

jsmanrique commented 4 years ago

Oh, I didn't remember that. I am used to the Othersbucket in the Kibana of Open Distro, so I've thought it was a Kibana issue. Has anyone tested it in the Kibana of Open Distro?

jsmanrique commented 4 years ago

I have tested it with upstream Kibana 7.3.2 and, if I have understood @alpgarcia explanation, it seems that it works. I have tried withMissing checked and unchecked:

Top 10 top10

Top 100 top100

alpgarcia commented 4 years ago

I've tried it with OpenDistro 1.3.0, based on ES+Kibana 7.3.2.

I built the following pie chart, showing the unique count of authors by domain, restricted to 2 domains to get a short response:

Screenshot from 2020-02-19 11-58-34

What I've found is a couple of sequential queries (the second one needs the output of the first one):

  1. The aggregated values, with the cardinality metric as always.
  2. Another aggregation, filtering out the bucket values included in the first one, applying the cardinality metric. That is the unique count of the rest of the stuff.

You can check the details on the following gist: https://gist.github.com/alpgarcia/506020126d99f357f22506be7b38c079

The important parts are:

Summarizing, there are some things we could do:

Best, Alberto.

jsmanrique commented 4 years ago

I would suggest staying as close to upstream as possible, avoiding any custom development in Kibiter.

valeriocos commented 4 years ago

With the migration to ES/Kibiter 6.8, the pie chart visualizations can now leverage on the Others bucket provided by upstream which produces far better results than our previous implementation in Kibiter.

Due to the approximation done in ElasticSearch to calculate exact counts (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-aggregations-metrics-cardinality-aggregation.html#_counts_are_approximate), the results returned when using the Others bucket can contain some small imprecision (please check details in the link above)