Closed alpgarcia closed 4 years ago
It seems to be a Kibana issue, doesn't it?
Nope, this was an old feature coming from long time ago. It was added by @dlumbrer but originally introduced by @acs as far as I know: https://github.com/chaoss/grimoirelab-kibiter/pull/51 https://github.com/chaoss/grimoirelab-kibiter/commit/99e311d4d5d862a498a2fca80af951ad7c8cc3d2 https://github.com/chaoss/grimoirelab-kibiter/commit/d41d3bd676545af9fb1a82977d34214bb1579aab
Oh, I didn't remember that. I am used to the Others
bucket in the Kibana of Open Distro, so I've thought it was a Kibana issue. Has anyone tested it in the Kibana of Open Distro?
I have tested it with upstream Kibana 7.3.2 and, if I have understood @alpgarcia explanation, it seems that it works. I have tried withMissing
checked and unchecked:
Top 10
Top 100
I've tried it with OpenDistro 1.3.0, based on ES+Kibana 7.3.2.
I built the following pie chart, showing the unique count of authors by domain, restricted to 2 domains to get a short response:
What I've found is a couple of sequential queries (the second one needs the output of the first one):
You can check the details on the following gist: https://gist.github.com/alpgarcia/506020126d99f357f22506be7b38c079
The important parts are:
"aggs": {
"2": {
"terms": {
"field": "author_domain",
"order": {
"1": "desc"
},
"size": 2
},
"aggs": {
"1": {
"cardinality": {
"field": "author_uuid"
}
}
}
}
}
author_uuid
of all the documents which author_domain
is not one of those included in the response of the previous query:
"aggs": {
"other-filter": {
"aggs": {
"1": {
"cardinality": {
"field": "author_uuid"
}
}
},
"filters": {
"filters": {
"": {
"bool": {
"must": [{
"exists": {
"field": "author_domain"
}
}],
"filter": [],
"should": [],
"must_not": [{
"match_phrase": {
"author_domain": {
"query": "gmail.com"
}
}
}, {
"match_phrase": {
"author_domain": {
"query": "users.noreply.github.com"
}
}
}]
}
}
}
}
}
missing values
, the must clause would be removed from the previous query (that would be the case corresponding to the screenshot I shared above), so documents will be taken into account no matter if they have author_domain
or not:
"must": [{
"exists": {
"field": "author_domain"
}
}]
Summarizing, there are some things we could do:
Best, Alberto.
I would suggest staying as close to upstream as possible, avoiding any custom development in Kibiter.
With the migration to ES/Kibiter 6.8, the pie chart visualizations can now leverage on the Others
bucket provided by upstream which produces far better results than our previous implementation in Kibiter.
Due to the approximation done in ElasticSearch to calculate exact counts (https://www.elastic.co/guide/en/elasticsearch/reference/6.8/search-aggregations-metrics-cardinality-aggregation.html#_counts_are_approximate), the results returned when using the Others bucket can contain some small imprecision (please check details in the link above)
Kibiter version: 6.1.4-1 Browser: Chrome 78.0.3904.97 (Build oficial) (64 bits) Related (somehow) to: #107
Others
bucket is showing the number of documents that aren't included in any bucket. That makes the percentage to be wrong when usingunique count
as metric, asOthers
doesn't apply any metric but just a simple document count.Steps to reproduce:
Some screenshots of this behavior (steps 4 and 6 respectively):