cube-js / cube

📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
https://cube.dev
Other
17.97k stars 1.78k forks source link

SQl API. Clickhouse as underlying db. Max(timeDimension) creates wrong query: data introspection queries support #5170

Open NikKomarov opened 2 years ago

NikKomarov commented 2 years ago

Failed SQL select max(InvoiceDateTime) from sales

Version: 0.30.28

Additional context We have cube 'sales' with underlying clickhouse table. It contains InvoiceDateTime column. Dimension is configured like below

invoiceDatetime: {
                title: `Date/time`,
                sql: `${CUBE}."InvoiceDateTime"`,
                type: `time`
            }

Selecting max() on this dimension via sql-api results in 2022-08-01 00:00:00.000000 (start of the month) Query, that goes to clickhouse

SELECT
      toDateTime(toStartOfMonth(toTimeZone(toDateTime(`sales`."InvoiceDateTime"), 'UTC'), 'UTC'), 'UTC') `sales__invoice_datetime_month`
    FROM
      sales AS `sales` GROUP BY `sales__invoice_datetime_month` ORDER BY `sales__invoice_datetime_month` ASC LIMIT 10000 FORMAT JSON

On the other hand, query below returns correct result, but it seems to be just workaround(

select InvoiceDateTime from sales
order by InvoiceDateTime desc
limit 1;

Can someone shed a light why sql-api tries to apply toStartOfMonth while selecting max(timeDimension) query? Is it a bug?

paveltiunov commented 2 years ago

Hey @NikKomarov ! There's currently no way to query max time dimension through Cube API, and SQL API applies this heuristic to fulfill some of the BI queries. We can definitely improve this heuristic query. However, I'm curious to hear what kind of problem you are trying to solve. Could you please elaborate on why do you need to fetch the max time dimension?

NikKomarov commented 2 years ago

@paveltiunov Short answer: just because we can)))

Long answer: we provide out clients with sql api, it seems to be very convenient way of intergrating data in their BI systems. But we cannot stop them from using sql api as usual sql))) Tbh, I've read about limitations of sql api and gave clients feedback that aggr. functions work only on measures, but decided to create the issue, because i'm sure sql-api can be smarter)

paveltiunov commented 2 years ago

@NikKomarov Makes sense. Implementation of such a query requires an extension of the current query format, which will allow introspection queries on dimensions like min/max queries. Let's use this issue to keep track of it.