cube-js / cube

šŸ“Š Cube ā€” Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
https://cube.dev
Other
17.97k stars 1.78k forks source link

Add WITH FILL modifier to Clickhouse driver #6893

Open mrenigma opened 1 year ago

mrenigma commented 1 year ago

Is your feature request related to a problem? Please describe. When running date ranged queries it is often advantageous to provide data for the entire date range.

Cube already does this partially with fillMissingDates but this only fills between two data points. It doesn't span the entire date range provided unless you have data at the very beginning and end of the date range.

This can cause confusion and misrepresentation when using graphs because the time axis doesn't make use of the full date range selected:

image

Clickhouse provides a WITH FILL modifier which provides this functionality out-of-the-box, but the current driver doesn't support applying the modifier.

Describe the solution you'd like To support filling the entire date range provided, I would propose updating fillMissingDates to span the entire date range. This would most likely entail changes to https://github.com/cube-js/cube/blob/master/packages/cubejs-schema-compiler/src/adapter/ClickHouseQuery.js or https://github.com/cube-js/cube/blob/master/packages/cubejs-clickhouse-driver/src/ClickHouseDriver.ts.

This change would be Clickhouse driver specific as I'm not sure if all the supported drivers have equivalent WITH FILL modifiers. Using fillMissingDates may be the most appropriate as its existing functionality, only being augmented for Clickhouse. Documentation would be need to be updated to describe the enhancement though.

Alternatively, to maintain backwards compatibility, a new pivot option could be used for the same purpose. Although, I'm not sure if having a pivot option for one driver is a viable solution.

I would be more than happy to contribute a PR if we can agree on an acceptable path forward.

Describe alternatives you've considered We have written a transformer layer in Ruby on Rails which takes the data returned by Cube and fills in the missing dates. However, this isn't really scalable as it's all done in memory whilst Clickhouse is designed for this kind of work. Our transformer also doesn't support the more complex query options provided by Cube.

Additional context https://gitlab.com/gitlab-org/gitlab/-/issues/417231

mrenigma commented 1 year ago

I've realized that if you want to use this approach through the API directly, and not use it for the UI, using fillMissingDates wouldn't support that approach šŸ¤”. It might make more sense to add it as an option at the query layer rather than the pivotConfig

mrenigma commented 1 year ago

Additional investigation has shown that using the timeDimensions dateRange option will cause the UI to fill the full range but using a inDateRange operator within filters will not. This also leaves the filling up the UI so using the API directly won't work.