cube-js / cube

📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
https://cube.dev
Other
17.97k stars 1.78k forks source link

Cube + Trino - Performance and Connection Hanging Issues #7911

Open ndrluis opened 8 months ago

ndrluis commented 8 months ago

Problem

Cube Version: v0.34.50

Hello, I’m trying to use Cube + Trino and pre-aggregate some data. I have a table where the output data is 10 GB, and I receive an error on my refresh worker with “Query execution timeout after 10 minutes of waiting”. For some reason, the connection continues, and the Cube Worker starts a new query without terminating the previous attempts.

I believe there is a bug in Cube + Trino that causes the connection to remain open when an error occurs.

However, my problem is that 10 GB seems too small to require so much time; the query to finish the scan takes 2 minutes, but more than 10 minutes to write the pre-aggregations is too slow. How can I improve this performance?

image image

Logs:

{"message":"Error while querying","queueId":97600,"processingId":97600,"queueSize":0,"duration":600001,"queryKey":[["CREATE TABLE prod_pre_aggregations.rpt_trino_logs_trino_logs AS SELECT\n      \"rpt_trino_logs\".datetime_created_at \"rpt_trino_logs__datetime_created_at\", \"rpt_trino_logs\".description_hostname \"rpt_trino_logs__description_hostname\", \"rpt_trino_logs\".description_query \"rpt_trino_logs__description_query\", \"rpt_trino_logs\".description_query_type \"rpt_trino_logs__description_query_type\", \"rpt_trino_logs\".description_scanned_catalogs \"rpt_trino_logs__description_scanned_catalogs\", \"rpt_trino_logs\".description_user \"rpt_trino_logs__description_user\", max(value_execution_time) \"rpt_trino_logs__max_execution_time\", sum(value_physical_input_bytes / power(1000, 4)) \"rpt_trino_logs__qt_physical_input_terabytes\", count(*) \"rpt_trino_logs__qt_query_count\", (sum(value_physical_input_bytes / power(1000, 4)) * 5.00) \"rpt_trino_logs__total_estimated_athena_query_cost\", sum(\"rpt_trino_logs\".value_execution_time) \"rpt_trino_logs__total_execution_time\"\n    FROM\n      internal_tools.rpt_trino_logs AS \"rpt_trino_logs\"  GROUP BY 1, 2, 3, 4, 5, 6",[]],[[{"_col0":"2024-03-13 13:42:45.249 UTC"}]]],"queuePrefix":"SQL_PRE_AGGREGATIONS_STANDALONE_default","requestId":"scheduler-66d21c69-c57a-4869-9f92-6627ce1a14f0","timeInQueue":5,"preAggregationId":"rpt_trino_logs.trino_logs","newVersionEntry":{"table_name":"prod_pre_aggregations.rpt_trino_logs_trino_logs","structure_version":"0el32tlb","content_version":"0jm5nacs","last_updated_at":1710361267211,"naming_version":2},"preAggregation":{"preAggregationId":"rpt_trino_logs.trino_logs","timezone":"UTC","timestampFormat":"YYYY-MM-DD[T]HH:mm:ss.SSS[Z]","tableName":"prod_pre_aggregations.rpt_trino_logs_trino_logs","invalidateKeyQueries":[["select max(made_current_at) from internal_tools.\"rpt_trino_logs$history\"",[],{"external":false,"renewalThreshold":120}]],"type":"rollup","external":true,"previewSql":["SELECT * FROM prod_pre_aggregations.rpt_trino_logs_trino_logs LIMIT 1000",[]],"preAggregationsSchema":"prod_pre_aggregations","loadSql":["CREATE TABLE prod_pre_aggregations.rpt_trino_logs_trino_logs AS SELECT\n      \"rpt_trino_logs\".datetime_created_at \"rpt_trino_logs__datetime_created_at\", \"rpt_trino_logs\".description_hostname \"rpt_trino_logs__description_hostname\", \"rpt_trino_logs\".description_query \"rpt_trino_logs__description_query\", \"rpt_trino_logs\".description_query_type \"rpt_trino_logs__description_query_type\", \"rpt_trino_logs\".description_scanned_catalogs \"rpt_trino_logs__description_scanned_catalogs\", \"rpt_trino_logs\".description_user \"rpt_trino_logs__description_user\", max(value_execution_time) \"rpt_trino_logs__max_execution_time\", sum(value_physical_input_bytes / power(1000, 4)) \"rpt_trino_logs__qt_physical_input_terabytes\", count(*) \"rpt_trino_logs__qt_query_count\", (sum(value_physical_input_bytes / power(1000, 4)) * 5.00) \"rpt_trino_logs__total_estimated_athena_query_cost\", sum(\"rpt_trino_logs\".value_execution_time) \"rpt_trino_logs__total_execution_time\"\n    FROM\n      internal_tools.rpt_trino_logs AS \"rpt_trino_logs\"  GROUP BY 1, 2, 3, 4, 5, 6",[]],"sql":["SELECT\n      \"rpt_trino_logs\".datetime_created_at \"rpt_trino_logs__datetime_created_at\", \"rpt_trino_logs\".description_hostname \"rpt_trino_logs__description_hostname\", \"rpt_trino_logs\".description_query \"rpt_trino_logs__description_query\", \"rpt_trino_logs\".description_query_type \"rpt_trino_logs__description_query_type\", \"rpt_trino_logs\".description_scanned_catalogs \"rpt_trino_logs__description_scanned_catalogs\", \"rpt_trino_logs\".description_user \"rpt_trino_logs__description_user\", max(value_execution_time) \"rpt_trino_logs__max_execution_time\", sum(value_physical_input_bytes / power(1000, 4)) \"rpt_trino_logs__qt_physical_input_terabytes\", count(*) \"rpt_trino_logs__qt_query_count\", (sum(value_physical_input_bytes / power(1000, 4)) * 5.00) \"rpt_trino_logs__total_estimated_athena_query_cost\", sum(\"rpt_trino_logs\".value_execution_time) \"rpt_trino_logs__total_execution_time\"\n    FROM\n      internal_tools.rpt_trino_logs AS \"rpt_trino_logs\"  GROUP BY 1, 2, 3, 4, 5, 6",[]],"uniqueKeyColumns":["\"rpt_trino_logs__datetime_created_at\"","\"rpt_trino_logs__description_hostname\"","\"rpt_trino_logs__description_query\"","\"rpt_trino_logs__description_query_type\"","\"rpt_trino_logs__description_scanned_catalogs\"","\"rpt_trino_logs__description_user\""],"aggregationsColumns":["sum(\"rpt_trino_logs__qt_physical_input_terabytes\")","sum(\"rpt_trino_logs__qt_query_count\")","sum(\"rpt_trino_logs__total_execution_time\")"],"dataSource":"default","indexesSql":[],"createTableIndexes":[],"readOnly":false,"expandedPartition":true,"priority":-1},"addedToQueueTime":1710361267212,"error":"Error: Query execution timeout after 10 min of waiting\n    at Timeout.<anonymous> (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:604:18)\n    at listOnTimeout (node:internal/timers:559:17)\n    at processTimers (node:internal/timers:502:7)"}
{"message":"Error querying db","error":"Error: Query execution timeout after 10 min of waiting\n    at QueryQueue.parseResult (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:384:13)\n    at QueryQueue.executeInQueue (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:245:21)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at PreAggregationLoader.loadPreAggregationWithKeys (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:784:7)\n    at PreAggregationLoader.loadPreAggregation (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:572:22)\n    at preAggregationPromise (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:2193:30)\n    at QueryOrchestrator.fetchQuery (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryOrchestrator.ts:241:9)\n    at OrchestratorApi.executeQuery (/cube/node_modules/@cubejs-backend/server-core/src/core/OrchestratorApi.ts:98:20)\n    at /cube/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:605:13\n    at async Promise.all (index 0)\n    at RefreshScheduler.refreshPreAggregations (/cube/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:590:5)\n    at async Promise.all (index 1)\n    at RefreshScheduler.runScheduledRefresh (/cube/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:285:9)","requestId":"scheduler-68199725-026f-421c-a938-84c39f1dbf5a"}
{"message":"Refresh Scheduler Error","error":"Error: Query execution timeout after 10 min of waiting","securityContext":{},"requestId":"scheduler-68199725-026f-421c-a938-84c39f1dbf5a"}
{"message":"Error while querying","queueId":97605,"processingId":97605,"queueSize":0,"duration":600001,"queryKey":[["CREATE TABLE prod_pre_aggregations.rpt_trino_logs_trino_logs AS SELECT\n      \"rpt_trino_logs\".datetime_created_at \"rpt_trino_logs__datetime_created_at\", \"rpt_trino_logs\".description_hostname \"rpt_trino_logs__description_hostname\", \"rpt_trino_logs\".description_query \"rpt_trino_logs__description_query\", \"rpt_trino_logs\".description_query_type \"rpt_trino_logs__description_query_type\", \"rpt_trino_logs\".description_scanned_catalogs \"rpt_trino_logs__description_scanned_catalogs\", \"rpt_trino_logs\".description_user \"rpt_trino_logs__description_user\", max(value_execution_time) \"rpt_trino_logs__max_execution_time\", sum(value_physical_input_bytes / power(1000, 4)) \"rpt_trino_logs__qt_physical_input_terabytes\", count(*) \"rpt_trino_logs__qt_query_count\", (sum(value_physical_input_bytes / power(1000, 4)) * 5.00) \"rpt_trino_logs__total_estimated_athena_query_cost\", sum(\"rpt_trino_logs\".value_execution_time) \"rpt_trino_logs__total_execution_time\"\n    FROM\n      internal_tools.rpt_trino_logs AS \"rpt_trino_logs\"  GROUP BY 1, 2, 3, 4, 5, 6",[]],[[{"_col0":"2024-03-13 13:42:45.249 UTC"}]]],"queuePrefix":"SQL_PRE_AGGREGATIONS_STANDALONE_default","requestId":"scheduler-17057e24-fb89-4bd3-908e-f0effe63c0cb","timeInQueue":5,"preAggregationId":"rpt_trino_logs.trino_logs","newVersionEntry":{"table_name":"prod_pre_aggregations.rpt_trino_logs_trino_logs","structure_version":"0el32tlb","content_version":"0jm5nacs","last_updated_at":1710361897221,"naming_version":2},"preAggregation":{"preAggregationId":"rpt_trino_logs.trino_logs","timezone":"UTC","timestampFormat":"YYYY-MM-DD[T]HH:mm:ss.SSS[Z]","tableName":"prod_pre_aggregations.rpt_trino_logs_trino_logs","invalidateKeyQueries":[["select max(made_current_at) from internal_tools.\"rpt_trino_logs$history\"",[],{"external":false,"renewalThreshold":120}]],"type":"rollup","external":true,"previewSql":["SELECT * FROM prod_pre_aggregations.rpt_trino_logs_trino_logs LIMIT 1000",[]],"preAggregationsSchema":"prod_pre_aggregations","loadSql":["CREATE TABLE prod_pre_aggregations.rpt_trino_logs_trino_logs AS SELECT\n      \"rpt_trino_logs\".datetime_created_at \"rpt_trino_logs__datetime_created_at\", \"rpt_trino_logs\".description_hostname \"rpt_trino_logs__description_hostname\", \"rpt_trino_logs\".description_query \"rpt_trino_logs__description_query\", \"rpt_trino_logs\".description_query_type \"rpt_trino_logs__description_query_type\", \"rpt_trino_logs\".description_scanned_catalogs \"rpt_trino_logs__description_scanned_catalogs\", \"rpt_trino_logs\".description_user \"rpt_trino_logs__description_user\", max(value_execution_time) \"rpt_trino_logs__max_execution_time\", sum(value_physical_input_bytes / power(1000, 4)) \"rpt_trino_logs__qt_physical_input_terabytes\", count(*) \"rpt_trino_logs__qt_query_count\", (sum(value_physical_input_bytes / power(1000, 4)) * 5.00) \"rpt_trino_logs__total_estimated_athena_query_cost\", sum(\"rpt_trino_logs\".value_execution_time) \"rpt_trino_logs__total_execution_time\"\n    FROM\n      internal_tools.rpt_trino_logs AS \"rpt_trino_logs\"  GROUP BY 1, 2, 3, 4, 5, 6",[]],"sql":["SELECT\n      \"rpt_trino_logs\".datetime_created_at \"rpt_trino_logs__datetime_created_at\", \"rpt_trino_logs\".description_hostname \"rpt_trino_logs__description_hostname\", \"rpt_trino_logs\".description_query \"rpt_trino_logs__description_query\", \"rpt_trino_logs\".description_query_type \"rpt_trino_logs__description_query_type\", \"rpt_trino_logs\".description_scanned_catalogs \"rpt_trino_logs__description_scanned_catalogs\", \"rpt_trino_logs\".description_user \"rpt_trino_logs__description_user\", max(value_execution_time) \"rpt_trino_logs__max_execution_time\", sum(value_physical_input_bytes / power(1000, 4)) \"rpt_trino_logs__qt_physical_input_terabytes\", count(*) \"rpt_trino_logs__qt_query_count\", (sum(value_physical_input_bytes / power(1000, 4)) * 5.00) \"rpt_trino_logs__total_estimated_athena_query_cost\", sum(\"rpt_trino_logs\".value_execution_time) \"rpt_trino_logs__total_execution_time\"\n    FROM\n      internal_tools.rpt_trino_logs AS \"rpt_trino_logs\"  GROUP BY 1, 2, 3, 4, 5, 6",[]],"uniqueKeyColumns":["\"rpt_trino_logs__datetime_created_at\"","\"rpt_trino_logs__description_hostname\"","\"rpt_trino_logs__description_query\"","\"rpt_trino_logs__description_query_type\"","\"rpt_trino_logs__description_scanned_catalogs\"","\"rpt_trino_logs__description_user\""],"aggregationsColumns":["sum(\"rpt_trino_logs__qt_physical_input_terabytes\")","sum(\"rpt_trino_logs__qt_query_count\")","sum(\"rpt_trino_logs__total_execution_time\")"],"dataSource":"default","indexesSql":[],"createTableIndexes":[],"readOnly":false,"expandedPartition":true,"priority":-1},"addedToQueueTime":1710361897222,"error":"Error: Query execution timeout after 10 min of waiting\n    at Timeout.<anonymous> (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:604:18)\n    at listOnTimeout (node:internal/timers:559:17)\n    at processTimers (node:internal/timers:502:7)"}
{"message":"Error querying db","error":"Error: Query execution timeout after 10 min of waiting\n    at QueryQueue.parseResult (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:384:13)\n    at QueryQueue.executeInQueue (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:356:19)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at PreAggregationLoader.loadPreAggregationWithKeys (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:784:7)\n    at PreAggregationLoader.loadPreAggregation (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:572:22)\n    at preAggregationPromise (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:2193:30)\n    at QueryOrchestrator.fetchQuery (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryOrchestrator.ts:241:9)\n    at OrchestratorApi.executeQuery (/cube/node_modules/@cubejs-backend/server-core/src/core/OrchestratorApi.ts:98:20)\n    at /cube/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:605:13\n    at async Promise.all (index 0)\n    at RefreshScheduler.refreshPreAggregations (/cube/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:590:5)\n    at async Promise.all (index 1)\n    at RefreshScheduler.runScheduledRefresh (/cube/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:285:9)","requestId":"scheduler-5e4aa8aa-8d79-495e-b73a-ea252a5e5ef7"}
{"message":"Refresh Scheduler Error","error":"Error: Query execution timeout after 10 min of waiting","securityContext":{},"requestId":"scheduler-5e4aa8aa-8d79-495e-b73a-ea252a5e5ef7"}
{"message":"Error while querying","queueId":97610,"processingId":97610,"queueSize":0,"duration":600001,"queryKey":[["CREATE TABLE prod_pre_aggregations.rpt_trino_logs_trino_logs AS SELECT\n      \"rpt_trino_logs\".datetime_created_at \"rpt_trino_logs__datetime_created_at\", \"rpt_trino_logs\".description_hostname \"rpt_trino_logs__description_hostname\", \"rpt_trino_logs\".description_query \"rpt_trino_logs__description_query\", \"rpt_trino_logs\".description_query_type \"rpt_trino_logs__description_query_type\", \"rpt_trino_logs\".description_scanned_catalogs \"rpt_trino_logs__description_scanned_catalogs\", \"rpt_trino_logs\".description_user \"rpt_trino_logs__description_user\", max(value_execution_time) \"rpt_trino_logs__max_execution_time\", sum(value_physical_input_bytes / power(1000, 4)) \"rpt_trino_logs__qt_physical_input_terabytes\", count(*) \"rpt_trino_logs__qt_query_count\", (sum(value_physical_input_bytes / power(1000, 4)) * 5.00) \"rpt_trino_logs__total_estimated_athena_query_cost\", sum(\"rpt_trino_logs\".value_execution_time) \"rpt_trino_logs__total_execution_time\"\n    FROM\n      internal_tools.rpt_trino_logs AS \"rpt_trino_logs\"  GROUP BY 1, 2, 3, 4, 5, 6",[]],[[{"_col0":"2024-03-13 13:42:45.249 UTC"}]]],"queuePrefix":"SQL_PRE_AGGREGATIONS_STANDALONE_default","requestId":"scheduler-cd1bab50-9bd6-4a5a-a4dd-2b7a95bb0512","timeInQueue":4,"preAggregationId":"rpt_trino_logs.trino_logs","newVersionEntry":{"table_name":"prod_pre_aggregations.rpt_trino_logs_trino_logs","structure_version":"0el32tlb","content_version":"0jm5nacs","last_updated_at":1710362527222,"naming_version":2},"preAggregation":{"preAggregationId":"rpt_trino_logs.trino_logs","timezone":"UTC","timestampFormat":"YYYY-MM-DD[T]HH:mm:ss.SSS[Z]","tableName":"prod_pre_aggregations.rpt_trino_logs_trino_logs","invalidateKeyQueries":[["select max(made_current_at) from internal_tools.\"rpt_trino_logs$history\"",[],{"external":false,"renewalThreshold":120}]],"type":"rollup","external":true,"previewSql":["SELECT * FROM prod_pre_aggregations.rpt_trino_logs_trino_logs LIMIT 1000",[]],"preAggregationsSchema":"prod_pre_aggregations","loadSql":["CREATE TABLE prod_pre_aggregations.rpt_trino_logs_trino_logs AS SELECT\n      \"rpt_trino_logs\".datetime_created_at \"rpt_trino_logs__datetime_created_at\", \"rpt_trino_logs\".description_hostname \"rpt_trino_logs__description_hostname\", \"rpt_trino_logs\".description_query \"rpt_trino_logs__description_query\", \"rpt_trino_logs\".description_query_type \"rpt_trino_logs__description_query_type\", \"rpt_trino_logs\".description_scanned_catalogs \"rpt_trino_logs__description_scanned_catalogs\", \"rpt_trino_logs\".description_user \"rpt_trino_logs__description_user\", max(value_execution_time) \"rpt_trino_logs__max_execution_time\", sum(value_physical_input_bytes / power(1000, 4)) \"rpt_trino_logs__qt_physical_input_terabytes\", count(*) \"rpt_trino_logs__qt_query_count\", (sum(value_physical_input_bytes / power(1000, 4)) * 5.00) \"rpt_trino_logs__total_estimated_athena_query_cost\", sum(\"rpt_trino_logs\".value_execution_time) \"rpt_trino_logs__total_execution_time\"\n    FROM\n      internal_tools.rpt_trino_logs AS \"rpt_trino_logs\"  GROUP BY 1, 2, 3, 4, 5, 6",[]],"sql":["SELECT\n      \"rpt_trino_logs\".datetime_created_at \"rpt_trino_logs__datetime_created_at\", \"rpt_trino_logs\".description_hostname \"rpt_trino_logs__description_hostname\", \"rpt_trino_logs\".description_query \"rpt_trino_logs__description_query\", \"rpt_trino_logs\".description_query_type \"rpt_trino_logs__description_query_type\", \"rpt_trino_logs\".description_scanned_catalogs \"rpt_trino_logs__description_scanned_catalogs\", \"rpt_trino_logs\".description_user \"rpt_trino_logs__description_user\", max(value_execution_time) \"rpt_trino_logs__max_execution_time\", sum(value_physical_input_bytes / power(1000, 4)) \"rpt_trino_logs__qt_physical_input_terabytes\", count(*) \"rpt_trino_logs__qt_query_count\", (sum(value_physical_input_bytes / power(1000, 4)) * 5.00) \"rpt_trino_logs__total_estimated_athena_query_cost\", sum(\"rpt_trino_logs\".value_execution_time) \"rpt_trino_logs__total_execution_time\"\n    FROM\n      internal_tools.rpt_trino_logs AS \"rpt_trino_logs\"  GROUP BY 1, 2, 3, 4, 5, 6",[]],"uniqueKeyColumns":["\"rpt_trino_logs__datetime_created_at\"","\"rpt_trino_logs__description_hostname\"","\"rpt_trino_logs__description_query\"","\"rpt_trino_logs__description_query_type\"","\"rpt_trino_logs__description_scanned_catalogs\"","\"rpt_trino_logs__description_user\""],"aggregationsColumns":["sum(\"rpt_trino_logs__qt_physical_input_terabytes\")","sum(\"rpt_trino_logs__qt_query_count\")","sum(\"rpt_trino_logs__total_execution_time\")"],"dataSource":"default","indexesSql":[],"createTableIndexes":[],"readOnly":false,"expandedPartition":true,"priority":-1},"addedToQueueTime":1710362527223,"error":"Error: Query execution timeout after 10 min of waiting\n    at Timeout.<anonymous> (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:604:18)\n    at listOnTimeout (node:internal/timers:559:17)\n    at processTimers (node:internal/timers:502:7)"}
{"message":"Error querying db","error":"Error: Query execution timeout after 10 min of waiting\n    at QueryQueue.parseResult (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:384:13)\n    at QueryQueue.executeInQueue (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryQueue.js:245:21)\n    at runMicrotasks (<anonymous>)\n    at processTicksAndRejections (node:internal/process/task_queues:96:5)\n    at PreAggregationLoader.loadPreAggregationWithKeys (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:784:7)\n    at PreAggregationLoader.loadPreAggregation (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:572:22)\n    at preAggregationPromise (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/PreAggregations.ts:2193:30)\n    at QueryOrchestrator.fetchQuery (/cube/node_modules/@cubejs-backend/query-orchestrator/src/orchestrator/QueryOrchestrator.ts:241:9)\n    at OrchestratorApi.executeQuery (/cube/node_modules/@cubejs-backend/server-core/src/core/OrchestratorApi.ts:98:20)\n    at /cube/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:605:13\n    at async Promise.all (index 0)\n    at RefreshScheduler.refreshPreAggregations (/cube/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:590:5)\n    at async Promise.all (index 1)\n    at RefreshScheduler.runScheduledRefresh (/cube/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:285:9)","requestId":"scheduler-f2e17e2f-5170-44ed-bc59-1286160d7841"}
{"message":"Refresh Scheduler Error","error":"Error: Query execution timeout after 10 min of waiting","securityContext":{},"requestId":"scheduler-f2e17e2f-5170-44ed-bc59-1286160d7841"}

Related Cube.js schema

cubes:
  - name: rpt_trino_logs
    sql: >
        select * from internal_tools.rpt_trino_logs
    refresh_key:
      sql: select max(made_current_at) from internal_tools."rpt_trino_logs$history"
    measures:
      - name: qt_query_count
        type: count
      - name: qt_physical_input_terabytes
        type: sum
        sql: value_physical_input_bytes / power(1000, 4)
      - name: total_execution_time
        type: sum
        sql: value_execution_time
      - name: total_estimated_athena_query_cost
        type: number
        sql: ({qt_physical_input_terabytes} * 5.00)
      - name: max_execution_time
        sql: max(value_execution_time)
        type: number
      - name: p99_execution_time
        sql: approx_percentile(value_execution_time, 0.99)
        type: number
      - name: p95_execution_time
        type: number
        sql: approx_percentile(value_execution_time, 0.95)
      - name: p50_execution_time
        type: number
        sql: approx_percentile(value_execution_time, 0.50)
    dimensions:
      - name: description_query
        type: string
        sql: description_query
      - name: description_scanned_catalogs
        type: string
        sql: description_scanned_catalogs
      - name: datetime_created_at
        type: time
        sql: datetime_created_at
      - name: description_hostname
        type: string
        sql: description_hostname
      - name: description_query_type
        type: string
        sql: description_query_type
      - name: description_user
        type: string
        sql: description_user

    pre_aggregations:
      - name: trino_logs
        measures:
          - qt_query_count
          - qt_physical_input_terabytes
          - total_execution_time
          - total_estimated_athena_query_cost
          - max_execution_time

        dimensions:
          - description_query
          - description_scanned_catalogs
          - datetime_created_at
          - description_hostname
          - description_query_type
          - description_user
ndrluis commented 8 months ago

I made a test using version 0.34.62 and encountered the same error

ndrluis commented 8 months ago

I have added a partition:

        time_dimension: datetime_created_at
        granularity: day
        partition_granularity: month

The smallest output is 652 MB and it takes 7 seconds to read, but it took 8 minutes to write. The other partitions have around 2 GB of output, and I'm receiving the 10-minute error.

I understand that using partitions is recommended, but 8 minutes to write a partition of 652 MB seems excessively long.

image
igorlukanin commented 8 months ago

We've discussed this in Slack but leaving the comment here for transparency: it's really worth investigating what is happening in this particular use case but it looks like supporting an export bucket for Trino would be the best solution here.