[Rollups] Combination charts should default to most granular data in the time interval

tbragin commented 5 years ago

Based on my testing, it seems to be the case that when you create a chart with raw and rolled up data, Kibana defaults to the least granular time interval. Below see concrete example.

@alexfrancoeur @marius-dr Can someone confirm that I'm not missing something, and there isn't another way to configure rollups jobs / index patterns / charts, such that user would see the most granular data available in a combination chart?

And if I'm not, can we treat this as a feature request?

Example:

(1) Load the logs data set from sample data documentation https://www.elastic.co/guide/en/kibana/current/tutorial-load-dataset.html

(2) Apply this rollup job and start it:

PUT _xpack/rollup/job/logstash
{
    "index_pattern": "logstash-*",
    "rollup_index": "logstash_rollup",
    "cron": "*/30 * * * * ?",
    "page_size" :1000,
    "groups" : {
      "date_histogram": {
        "field": "@timestamp",
        "interval": "1h",
        "delay": "7d"
      },
      "terms": {
        "fields": ["clientip.keyword", "extension.keyword", "host.keyword", "referer.keyword", "request.keyword", "response.keyword", "url.keyword"]
      }
    },
    "metrics": [
        {
            "field": "bytes",
            "metrics": ["min", "max", "sum"]
        }
    ]
}

(3) Create a combination index pattern

(4) Create the following chart

timroes commented 5 years ago

I am not exactly sure I get your request exactly? What would be the most granular data in your example? How would you expect it to behave?

We need to use the least granular data among all configured jobs, since that's the only one that is working. If we trying to go more granular then a job can do, Elasticsearch will produce an error, because whatever interval we use, must be the a multiple (we use the least common multiple) of all required "granularities".

cc @jen-huang

alexfrancoeur commented 5 years ago

@timroes, @tbragin can correct me if I'm wrong but I think this is the scenario.

At the moment, a user can use the most granular interval as defined by the rollup job. This will then apply to all rolled up and raw data. This is what Elasticsearch is doing behind the scenes if it's querying both types of indices.

What happens if the rollup search endpoint only returns raw data? Does it then act as a normal search endpoint and can use any interval? If so, that's not shown in the UI. We are locked into the previously defined interval that supports the rollup. You can imagine zooming into the the third spike in Tanya's chart and that being raw data. If she drilled in further than 1 hour, there would be not data in the chart where a user might expect to see the raw data points. To me, the solution would almost default to the "auto" interval if the timeframe shown for this request is only showing raw data.

That being said, Tanya and I only spoke briefly on this so I may be off.

polyfractal commented 5 years ago

Popping in to help with the ES side of the question.

What happens if the rollup search endpoint only returns raw data? Does it then act as a normal search endpoint and can use any interval?

Technically, yes, if the RollupSearch endpoint is invoked with only raw indices (no rollup indices in the URL), it basically just fires off a regular search request with no interval validation.

If the RollupSearch is invoked with rollup + raw indices in the same API call -- and it just so happens there is no rollup data in the requested window of time -- the query that is executed is essentially the same as a regular search. The difference is that we don't know rollup data is missing until after we execute, so we have to enforce the interval (and other) validation before allowing the search to continue.

So to get the behavior requested, Kibana would have to know the "bounds" of the rollup/raw partitions and change how it invokes RollupSearch (or just use a regular search). Or equivalently Elasticsearch would have to be smarter and know the boundaries before the search commences. In either case it sounds relatively more complex/difficult.

alexfrancoeur commented 5 years ago

Thanks @polyfractal! That's how I understood it as well, appreciate the additional context

tbragin commented 5 years ago

I would expect that if both raw data and rolled up data is available for a particular time interval, zooming into a chart where raw data is available for the whole time interval, I'd see the data at the granularity of the raw data. In this case, it appears that if the rolled up data is present, we use the granularity of the rolled up data, even if raw data with greater granularity is also present.

timroes commented 5 years ago

Thanks @polyfractal for the input here. As Zach already mentioned, ES does not know if the query only covers raw data before executing it. It would be even harder for Kibana to determine that.

The only solution I would see is doing a preflight request to Elasticsearch to get the min and max time range within the raw index of that index pattern (which we even currently don't know which part that is), Than with a lot of edge cases I can't all think about yet (like histogram offset, timezones, etc.) we could check if that data is within the requested time range and instead do a regular request. There are a lot of issues I see in this approach:

Lot of edge cases: I am not sure if we actually can reliable check if the covered time range actually only holds raw data, by taking everything into account. I think we will end up with a couple of cases where we think we could do a raw query but then fail because actually there was also rolled-up data in that time range.
Data can change: Even if we could reliable calculate the above, data can change, i.e. if someone ingests data into the raw index AFTER (or while) we've done the preflight request our above calculation is not true anymore, and we could again error with the actual request at elastic search.
Which interval to use? In the case we want to do a raw request, we would still check which interval to send to ES. Since the user could potentially visualize over a time range with also rolled-up data (and in most cases will), the editor needs to limit the entry to multiples of the allowed rolled-up time interval. So if the user entered now 1d as an interval, even if we detect we could go more granular due to the lack of rolled-up data, which interval should we use? Should the user be able to enter two intervals, one only to use if rolled-up data is contained (and thus needs to comply with the rolled-up interval) and one for if only raw data is available? I currently can't imagine a good UX around that. So I think this would only make sense for auto intervals. This would in the future only work if the auto date histogram in ES would do that behavior on rolled-up data, since we are trying to move over to the auto date histogram in the mid term for Kibana, so we don't have the complex bucket calculation in Kibana anymore.

Also if we would implement this on the Kibana side, we need to implement that behavior three times, in all the different querying infrastructures (Courier, TSVB, Timelion).

In general that feature sounds to me, like it's only achievable in a technical proper way if we would have that support in ES (together with auto data histogram). @tbragin maybe you could open an issue to get that behavior into auto date histogram with Elasticsearch, which would then automatically supported, once we switch over Kibana to using that aggregation?

tbragin commented 5 years ago

In general that feature sounds to me, like it's only achievable in a technical proper way if we would have that support in ES (together with auto data histogram). @tbragin maybe you could open an issue to get that behavior into auto date histogram with Elasticsearch, which would then automatically supported, once we switch over Kibana to using that aggregation?

@polyfractal What are your thoughts on this? Would like your input before opening the issue in Elasticsearch repo.

polyfractal commented 5 years ago

@tbragin I think it's a reasonable feature to investigate on the ES side. I don't know if we will be able to support it (technically), but it does feel like something that ES should be providing rather than Kibana trying to work around.

My current thinking is that we may end up with some kind of hybrid responsibility. E.g. a RollupSearch is executed by Kibana. While creating the response we know which parts of the data are 100% generated by raw data. So perhaps ES could tag those parts of the response (via a _meta field or similar) as being "entirely raw data". This would let kibana know that if it was zooming into a region entirely containing "raw" data, it could switch to a regular search. Or continue using the RollupSearch but drop the rollup index which would essentially convert back to regular search.

Alternatively, we could maybe do the bounds-checking on the Elasticsearch side but it has a lot of the same issues that @timroes mentioned, just on the ES side.

Not sure, but I think an ES issue investigating is the right place for now. :+1:

timroes commented 5 years ago

I will mark this as blocked on ES side right now. @tbragin please link the ES issue once created here, thanks!

tbragin commented 5 years ago

@polyfractal Could I ask you to create the issue in ES? I think you are best positioned to outline in sufficient detail the potential enhancements which would help make this work better in Kibana.

polyfractal commented 5 years ago

Issue created here: https://github.com/elastic/elasticsearch/issues/35744

elastic / kibana

[Rollups] Combination charts should default to most granular data in the time interval #24059