cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.85k stars 1.77k forks source link

Cube crashes when storing pre-aggregation with duckdb (type `hugeInt`) #7127

Open hugoJuhel opened 1 year ago

hugoJuhel commented 1 year ago

Describe the bug

Cubestore does not support the HugeInt type used by duckdb

I have first encountered the bug with the on-prem cube, but I have been able to reproduce it on the cube cloud. Both the cube files and printscreens are comming from the cloud version.

When using Cube with the duckdb driver, Cube fails at creating (external) pre-aggregation

To Reproduce Steps to reproduce the behavior :

  1. Create parquet file on s3 containing at least one integer column
  2. Connect cube to the s3 account
  3. Create a schema where one measure is a sum over the integer column
  4. Add the measure a preaggregation
  5. The cube will fail when trying to create the pre-aggregation table with a hugeInt column to store the data.

I have also created a small cube definition showcasing the bug.

Expected behavior The driver should be casting the hugeInt to the corresponding MySql type

Screenshots Schema : image

Playground : image

Pre-aggregation error : image

Stack trace

Error: Error during create table: CREATE TABLE prod_pre_aggregations.spam_main_w1hqvvny_l2pn4q2e_1ig1fr6 (`spam__org` varchar(255), `spam__metric` HUGEINT) LOCATION ?: Custom type 'HUGEINT' is not supported
    at WebSocket.<anonymous> (/cube/node_modules/@cubejs-backend/cubestore-driver/src/WebSocketConnection.ts:121:30)
    at WebSocket.emit (node:events:513:28)
    at Receiver.receiverOnMessage (/cube/node_modules/ws/lib/websocket.js:1008:20)
    at Receiver.emit (node:events:513:28)
    at Receiver.dataMessage (/cube/node_modules/ws/lib/receiver.js:502:14)
    at Receiver.getData (/cube/node_modules/ws/lib/receiver.js:435:17)
    at Receiver.startLoop (/cube/node_modules/ws/lib/receiver.js:143:22)
    at Receiver._write (/cube/node_modules/ws/lib/receiver.js:78:10)
    at writeOrBuffer (node:internal/streams/writable:391:12)
    at _write (node:internal/streams/writable:332:10)
    at Receiver.Writable.write (node:internal/streams/writable:336:10)
    at TLSSocket.socketOnData (/cube/node_modules/ws/lib/websocket.js:1102:35)
    at TLSSocket.emit (node:events:513:28)
    at addChunk (node:internal/streams/readable:315:12)
    at readableAddChunk (node:internal/streams/readable:289:9)
    at TLSSocket.Readable.push (node:internal/streams/readable:228:10)
    at TLSWrap.onStreamRead (node:internal/stream_base_commons:190:23)

Minimally reproducible Cube Schema

cubes: 
  - name: spam
    pre_aggregations:
      - name: main
        dimensions:
          - CUBE.org
        measures:
          - CUBE.metric

    sql: SELECT 'foo' AS org, 10::hugeInt AS metric
    dimensions:
      - name: org
        sql: org
        type: string
        primary_key: true
    measures:
        - name: metric
          sql: metric
          type: sum

Version: v0.33.53

Additional context Dockerfile used to run the Cube

version: "3.9"

services:

  cube:
    network_mode: host
    image: cubejs/cube:v0.33.53
    environment:
      - CUBEJS_DB_TYPE=duckdb
      - CUBEJS_DB_DUCKDB_S3_ACCESS_KEY_ID=<REDACTED>
      - CUBEJS_DB_DUCKDB_S3_SECRET_ACCESS_KEY=<REDACTED>
      - CUBEJS_DB_DUCKDB_S3_ENDPOINT=s3.ca-central-1.amazonaws.com
      - CUBEJS_DB_DUCKDB_S3_REGION=ca-central-1
      - CUBEJS_API_SECRET=<REDACTED>
      - CUBEJS_CONCURRENCY=8
      - CUBEJS_DEV_MODE=true

    volumes:
      - .:/cube/conf
github-actions[bot] commented 1 year ago

If you are interested in working on this issue, please leave a comment below and we will be happy to assign the issue to you. If this is the first time you are contributing a Pull Request to Cube.js, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube.js Slack.

paveltiunov commented 1 year ago

@hugoJuhel You can cast to int as a workaround.

hugoJuhel commented 1 year ago

@hugoJuhel You can cast to int as a workaround.

@paveltiunov , thanks for your suggestion.

I tried that in Cube Cloud yesterday but, even when casted as an int in the schema definition, the pre-aggregation still casts it back as an hugeInt.

here is the yaml definition I have been using on cube cloud.

cubes: 
  - name: spam
    pre_aggregations:
      - name: main
        dimensions:
          - CUBE.org
        measures:
          - CUBE.metric

    sql: SELECT 'foo' AS org, 10::int AS metric
    dimensions:
      - name: org
        sql: org
        type: string
        primary_key: true
    measures:
        - name: metric
          sql: metric::int
          type: sum

The only workaround I have found so far is casting my field to a float type.

dyegoaurelio commented 1 week ago

any updates?

igorlukanin commented 5 days ago

@dyegoaurelio Not yet.

I can confirm that the behavior is still the same (hugeInt-related error) on Cube v0.35.81 (latest in v0.35.x).

However, on v0.36.0 thru v0.36.7 (latest) the query that should trigger a pre-aggregation build just never finishes:

Screenshot 2024-10-09 at 00 16 08