cube-js / cube

πŸ“Š Cube β€” Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
https://cube.dev
Other
18.02k stars 1.78k forks source link

SQL API fails when aggregating using EXTRACT on a timestamp #8313

Open sarchila opened 6 months ago

sarchila commented 6 months ago

Failed SQL I'm seeing some issues when using timestamp EXTRACT. See below:

# It works if I write the query out like so:
> SELECT EXTRACT(YEAR FROM created_at), order_count FROM public.orders_cube;

 datepart(Utf8("YEAR"),orders_cube.created_at) | order_count
-----------------------------------------------+-------------
                                          2019 |        2052
                                          2020 |        2043
                                          2021 |        2031
                                          2022 |        2122
                                          2023 |        1752
(5 rows)

# But using a column alias for the EXTRACT column triggers an error:
> SELECT EXTRACT(YEAR FROM created_at) as created_at_year, order_count FROM public.orders_cube;
# ERROR:  Error during rewrite: Unexpected panic. Reason: Unexpected expression node: AggregateAggrExpr([]). Please check logs for additional information.
# QUERY: SELECT EXTRACT(YEAR FROM created_at) AS created_at_year, order_count FROM public.orders_cube

# and GROUP BY doesn't work when using MEASURE as the aggregation function...:
> SELECT EXTRACT(YEAR FROM created_at), MEASURE(order_count) FROM public.orders_cube GROUP BY 1;
# ERROR:  Error during rewrite: Can't detect Cube query and it may be not supported yet. Please check logs for additional information.
# QUERY: SELECT EXTRACT(YEAR FROM created_at), MEASURE(order_count) FROM public.orders_cube GROUP BY 1

# ...but it works when using another aggregation function
> SELECT EXTRACT(YEAR FROM created_at), COUNT(order_count) FROM public.orders_cube GROUP BY 1;

 datepart(Utf8("YEAR"),orders_cube.created_at) | COUNT(orders_cube.order_count)
-----------------------------------------------+--------------------------------
                                          2019 |                           2052
                                          2020 |                           2043
                                          2021 |                           2031
                                          2022 |                           2122
                                          2023 |                           1752
(5 rows)

> SELECT EXTRACT(YEAR FROM created_at), SUM(order_count) FROM public.orders_cube GROUP BY 1;
 datepart(Utf8("YEAR"),orders_cube.created_at) | SUM(orders_cube.order_count)
-----------------------------------------------+------------------------------
                                          2019 |                         2052
                                          2020 |                         2043
                                          2021 |                         2031
                                          2022 |                         2122
                                          2023 |                         1752
(5 rows)

Version: 0.35.41

Additional context

igorlukanin commented 5 months ago

Hi @sarchila πŸ‘‹

First of all, thanks for reporting this!

Tried using CUBESQL_SQL_PUSH_DOWN=true, but that did not help.

This is very interesting! Using CUBESQL_SQL_PUSH_DOWN=true would be the preferred way.

Could you please try the latest version, check all queries below with the query pushdown enabled, and provide the error messages that you observe?

hansonw commented 5 months ago

Thanks @igorlukanin, I'm working together with @sarchila on this one. We aren't able to test with the 'latest' version (only stable) but with CUBESQL_SQL_PUSH_DOWN=true we get the following results:

-- with alias
SELECT EXTRACT(YEAR FROM created_at) as created_at_year, order_count FROM public.orders_cube;
-- (This is OK)

-- with GROUP BY
SELECT EXTRACT(YEAR FROM created_at), MEASURE(order_count) FROM public.orders_cube GROUP BY 1;
-- still shows this in the error log
{"message":"SQL API Error","query":{"sql":"SELECT EXTRACT(YEAR FROM created_at), MEASURE(order_count) FROM public.orders_cube GROUP BY 1;"},"error":"Error during rewrite: Can't detect Cube query and it may be not supported yet. Please check logs for additional information.","duration":195,"securityContext":{},"requestId":"964a7a4f-0f9a-437e-9277-8ce33aa6b3eb-span-1","appName":"NULL","protocol":"postgres","apiType":"sql"}

So it still seems to be an issue with pushdown (at least on stable, 0.35.41).

hansonw commented 5 months ago

Update: we were able to test with 'latest' (0.35.47), with pushdown, and the error is still the same.

paveltiunov commented 5 months ago

@hansonw We can't reproduce this one. Most probably, https://cube.dev/docs/reference/configuration/environment-variables#cubesql_sql_push_down isn't enabled.

sarchila commented 5 months ago

Hi @paveltiunov & @igorlukanin Thanks for taking a look! Just tried it again, confirming that CUBESQL_SQL_PUSH_DOWN = true and with CUBEJS_LOG_LEVEL set to trace, here are the logs:

2024-06-10T22:52:46.553Z {"message":"Load Request","query":{"sql":"SELECT EXTRACT(YEAR FROM created_at), MEASURE(order_count) FROM public.orders_cube GROUP BY 1;"},"securityContext":{},"requestId":"03de539a-69b6-4d13-8634-73fafce291f7-span-1","appName":"NULL","protocol":"postgres","apiType":"sql"}
2024-06-10T22:52:46.553Z 2024-06-10 22:52:46,553 DEBUG [cubesql::sql::postgres::shim] Query: SELECT EXTRACT(YEAR FROM created_at), MEASURE(order_count) FROM public.orders_cube GROUP BY 1;
2024-06-10T22:52:46.553Z 2024-06-10 22:52:46,553 TRACE [cubejs_native::python::runtime] New task
2024-06-10T22:52:46.554Z 2024-06-10 22:52:46,554 DEBUG [cubesql::compile::parser] Parsing SQL: SELECT EXTRACT(YEAR FROM created_at), MEASURE(order_count) FROM public.orders_cube GROUP BY 1;
2024-06-10T22:52:46.554Z {"message":"SQL API Query Planning","query":{"sql":"SELECT EXTRACT(YEAR FROM created_at), MEASURE(order_count) FROM public.orders_cube GROUP BY 1;"},"securityContext":{},"requestId":"03de539a-69b6-4d13-8634-73fafce291f7-span-1","appName":"NULL","protocol":"postgres","apiType":"sql"}
2024-06-10T22:52:46.593Z 2024-06-10 22:52:46,593 DEBUG [cubesql::compile] Initial Plan: Projection: #datepart(Utf8("YEAR"),orders_cube.created_at), #measure(orders_cube.order_count)
2024-06-10T22:52:46.593Z   Aggregate: groupBy=[[datepart(Utf8("YEAR"), #orders_cube.created_at)]], aggr=[[measure(#orders_cube.order_count)]]
2024-06-10T22:52:46.593Z     TableScan: orders_cube projection=None
2024-06-10T22:52:46.594Z 2024-06-10 22:52:46,594 TRACE [cubejs_native::python::runtime] New task
2024-06-10T22:52:46.596Z 2024-06-10 22:52:46,595 TRACE [cubejs_native::python::runtime] New task
2024-06-10T22:52:46.596Z 2024-06-10 22:52:46,596 TRACE [cubejs_native::python::runtime] New task
2024-06-10T22:52:46.602Z 2024-06-10 22:52:46,601 DEBUG [cubesql::compile::rewrite::rewriter] Iterations: [Iteration { egraph_nodes: 31, egraph_classes: 31, applied: {"cube-scan": 1}, hook_time: 4.71e-7, search_time: 0.00017867, apply_time: 0.00022931, rebuild_time: 1.139e-5, total_time: 0.0004232, data: IterInfo { debug_info: None, debug_qtrace_eclasses: None }, n_rebuilds: 0, stop_reason: None }, Iteration { egraph_nodes: 47, egraph_classes: 46, applied: {"push-down-aggregate": 1, "wrapper-cube-scan-wrap": 1, "split-projection-aggregate": 1, "split-aggregate-aggregate": 1}, hook_time: 1.3e-7, search_time: 0.00010168, apply_time: 0.00025869, rebuild_time: 1.053e-5, total_time: 0.00037209, data: IterInfo { debug_info: None, debug_qtrace_eclasses: None }, n_rebuilds: 0, stop_reason: None }, Iteration { egraph_nodes: 78, egraph_classes: 73, applied: {"member-pushdown-replacer-aggregate-group-push-down": 1, "member-pushdown-replacer-aggregate-aggr-push-down": 1, "wrapper-finalize-pull-up-replacer": 1, "wrapper-push-down-aggregate-to-cube-scan": 1, "split-aggr-group-expr-aggregate-push-down": 1, "split-aggr-group-expr-projection-push-down": 1, "split-aggr-aggr-expr-aggregate-push-down": 1, "split-aggr-aggr-expr-projection-push-down": 1}, hook_time: 3e-8, search_time: 0.00022393, apply_time: 0.00014315, rebuild_time: 1.301e-5, total_time: 0.00038098, data: IterInfo { debug_info: None, debug_qtrace_eclasses: None }, n_rebuilds: 0, stop_reason: None }, Iteration { egraph_nodes: 115, egraph_classes: 102, applied: {"member-pushdown-replacer-column-find-matching-old-member-udaf-fun": 1, "wrapper-aggregate-aggr-expr-push-down": 1, "wrapper-aggregate-group-expr-push-down": 1}, hook_time: 5e-8, search_time: 0.00033327, apply_time: 9.816e-5, rebuild_time: 8.72e-6, total_time: 0.00044129, data: IterInfo { debug_info: None, debug_qtrace_eclasses: None }, n_rebuilds: 0, stop_reason: None }, Iteration { egraph_nodes: 122, egraph_classes: 106, applied: {"member-pushdown-replacer-udaf-fun": 1, "wrapper-member-pushdown-replacer-udaf-fun": 1, "wrapper-push-down-function": 1}, hook_time: 3e-8, search_time: 0.00031542, apply_time: 0.00048623, rebuild_time: 1.337e-5, total_time: 0.00081674, data: IterInfo { debug_info: None, debug_qtrace_eclasses: None }, n_rebuilds: 0, stop_reason: None }, Iteration { egraph_nodes: 130, egraph_classes: 111, applied: {"wrapper-aggregate-aggr-expr-pull-up": 1, "wrapper-push-down-scalar-function-args": 1}, hook_time: 7e-8, search_time: 0.0003457, apply_time: 0.00013515, rebuild_time: 9.11e-6, total_time: 0.00049097, data: IterInfo { debug_info: None, debug_qtrace_eclasses: None }, n_rebuilds: 0, stop_reason: None }, Iteration { egraph_nodes: 135, egraph_classes: 114, applied: {"wrapper-push-down-dimension": 1}, hook_time: 3e-8, search_time: 0.00038004, apply_time: 0.00013732, rebuild_time: 9.45e-6, total_time: 0.00052898, data: IterInfo { debug_info: None, debug_qtrace_eclasses: None }, n_rebuilds: 0, stop_reason: None }, Iteration { egraph_nodes: 136, egraph_classes: 114, applied: {}, hook_time: 5e-8, search_time: 0.00034461, apply_time: 0.00011593, rebuild_time: 8.25e-6, total_time: 0.00046971, data: IterInfo { debug_info: None, debug_qtrace_eclasses: None }, n_rebuilds: 0, stop_reason: Some(Saturated) }]
2024-06-10T22:52:46.604Z 2024-06-10 22:52:46,604 TRACE [cubejs_native::python::runtime] New task
2024-06-10T22:52:46.608Z 2024-06-10 22:52:46,607 DEBUG [cubesql::compile::rewrite::rewriter] Iterations: [Iteration { egraph_nodes: 138, egraph_classes: 115, applied: {"wrapper-push-down-literal": 1, "split-date-part-point-aggregate": 1}, hook_time: 7.1e-8, search_time: 0.000586529, apply_time: 0.000672399, rebuild_time: 4.81e-5, total_time: 0.001315519, data: IterInfo { debug_info: None, debug_qtrace_eclasses: None }, n_rebuilds: 0, stop_reason: None }, Iteration { egraph_nodes: 150, egraph_classes: 125, applied: {"wrapper-pull-up-scalar-function-args": 1, "split-aggr-group-expr-aggregate-AggregateGroupExpr-pull-up": 1}, hook_time: 4.6e-7, search_time: 0.00083285, apply_time: 0.00029004, rebuild_time: 1.21e-5, total_time: 0.00113978, data: IterInfo { debug_info: None, debug_qtrace_eclasses: None }, n_rebuilds: 0, stop_reason: None }, Iteration { egraph_nodes: 154, egraph_classes: 127, applied: {}, hook_time: 7e-8, search_time: 0.00047863, apply_time: 0.00020189, rebuild_time: 1.343e-5, total_time: 0.0006971, data: IterInfo { debug_info: None, debug_qtrace_eclasses: None }, n_rebuilds: 0, stop_reason: Some(Saturated) }]
2024-06-10T22:52:46.609Z 2024-06-10 22:52:46,609 TRACE [cubejs_native::python::runtime] New task
2024-06-10T22:52:46.611Z 2024-06-10 22:52:46,611 DEBUG [cubesql::compile::rewrite::rewriter] Iterations: [Iteration { egraph_nodes: 154, egraph_classes: 127, applied: {}, hook_time: 1.1e-7, search_time: 0.00085744, apply_time: 0.00035713, rebuild_time: 1.615e-5, total_time: 0.00123715, data: IterInfo { debug_info: None, debug_qtrace_eclasses: None }, n_rebuilds: 0, stop_reason: Some(Saturated) }]
2024-06-10T22:52:46.611Z 2024-06-10 22:52:46,611 DEBUG [cubesql::compile::rewrite::rewriter] Best: 0: ProjectionSplit(ProjectionSplit(false)), 1: ProjectionAlias(ProjectionAlias(None)), 2: AggregateSplit(AggregateSplit(false)), 3: AggregateUDFExprArgs([]), 4: ColumnExprColumn(ColumnExprColumn(Column { relation: Some("orders_cube"), name: "order_count" })), 5: ColumnExpr([4]), 6: AggregateUDFExprArgs([5, 3]), 7: AggregateUDFExprFun(AggregateUDFExprFun("measure")), 8: AggregateUDFExpr([7, 6]), 9: AggregateAggrExpr([8]), 10: ColumnExprColumn(ColumnExprColumn(Column { relation: Some("orders_cube"), name: "created_at" })), 11: ColumnExpr([10]), 12: LiteralExprValue(LiteralExprValue(Utf8("YEAR"))), 13: LiteralExpr([12]), 14: ScalarFunctionExprArgs([13, 11]), 15: ScalarFunctionExprFun(ScalarFunctionExprFun(DatePart)), 16: ScalarFunctionExpr([15, 14]), 17: AggregateGroupExpr([16]), 18: CubeScanUngrouped(CubeScanUngrouped(true)), 19: CubeScanWrapped(CubeScanWrapped(false)), 20: CubeScanCanPushdownJoin(CubeScanCanPushdownJoin(true)), 21: CubeScanSplit(CubeScanSplit(false)), 22: CubeScanOffset(CubeScanOffset(None)), 23: CubeScanLimit(CubeScanLimit(None)), 24: CubeScanOrder([]), 25: CubeScanFilters([]), 26: CubeScanMembers([]), 27: AllMembersAlias(AllMembersAlias("orders_cube")), 28: AllMembersCube(AllMembersCube("orders_cube")), 29: AllMembers([28, 27]), 30: CubeScanMembers([29, 26]), 31: CubeScanAliasToCube(CubeScanAliasToCube([("orders_cube", "orders_cube")])), 32: CubeScan([31, 30, 25, 24, 23, 22, 21, 20, 19, 18]), 33: Extension([32]), 34: Aggregate([33, 17, 9, 2]), 35: ColumnExprColumn(ColumnExprColumn(Column { relation: None, name: "measure(orders_cube.order_count)" })), 36: ColumnExpr([35]), 37: ColumnExprColumn(ColumnExprColumn(Column { relation: None, name: "datepart(Utf8(\"YEAR\"),orders_cube.created_at)" })), 38: ColumnExpr([37]), 39: ProjectionExpr([38, 36]), 40: Projection([39, 34, 1, 0])
2024-06-10T22:52:46.611Z 2024-06-10 22:52:46,611 DEBUG [cubesql::compile::rewrite::rewriter] Best cost: CubePlanCostAndState { cost: CubePlanCost { replacers: 0, table_scans: 0, empty_wrappers: 0, non_detected_cube_scans: 1, unwrapped_subqueries: 0, member_errors: 0, non_pushed_down_window: 0, non_pushed_down_grouping_sets: 0, ungrouped_aggregates: 1, wrapper_nodes: 0, wrapped_select_ungrouped_scan: 0, ast_size_outside_wrapper: 2, filters: 0, structure_points: 0, filter_members: 0, cube_members: 0, errors: 0, time_dimensions_used_as_dimensions: 0, max_time_dimensions_granularity: 0, cube_scan_nodes: 1, ast_size_without_alias: 41, ast_size: 41, ast_size_inside_wrapper: 0, ungrouped_nodes: 1 }, state: Unwrapped(1) }
2024-06-10T22:52:46.612Z 2024-06-10 22:52:46,611 ERROR [cubesql::compile] It may be this query is not supported yet. Please post an issue on GitHub https://github.com/cube-js/cube.js/issues/new?template=sql_api_query_issue.md or ask about it in Slack https://slack.cube.dev.
2024-06-10T22:52:46.612Z 2024-06-10 22:52:46,612 TRACE [cubesql::sql::postgres::shim] disabled backtrace
2024-06-10T22:52:46.612Z {"message":"SQL API Error","query":{"sql":"SELECT EXTRACT(YEAR FROM created_at), MEASURE(order_count) FROM public.orders_cube GROUP BY 1;"},"error":"Error during rewrite: Can't detect Cube query and it may be not supported yet. Please check logs for additional information.","duration":60,"securityContext":{},"requestId":"03de539a-69b6-4d13-8634-73fafce291f7-span-1","appName":"NULL","protocol":"postgres","apiType":"sql"}
2024-06-10T22:52:46.614Z {"message":"Cube SQL Error","protocol":"postgres","apiType":"sql","query":"SELECT EXTRACT(YEAR FROM created_at), MEASURE(order_count) FROM public.orders_cube GROUP BY 1","sanitizedQuery":"SELECT EXTRACT(YEAR FROM created_at), MEASURE(order_count) FROM public.orders_cube GROUP BY 1","appName":"NULL","error":"Error during rewrite: Can't detect Cube query and it may be not supported yet. Please check logs for additional information."}

Let us know if there's anything else that you would like or find helpful in order to help track this down.