cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.94k stars 1.78k forks source link

External storage of pre-aggregation throws when using Snowflake as a primary db #2592

Closed willyraedy closed 3 years ago

willyraedy commented 3 years ago

Describe the bug In order to create the external tables, the orchestrator queries Snowflake for the column names and types. This query returns no records due to a casing issue. Without columns, this causes a downstream syntax error when using MySQL and Postgres and a Sort key size can't be 0 for default, columns: [] error in Cube Store.

To Reproduce Steps to reproduce the behavior:

  1. Configure Cube to use Snowflake for the main db and externally store the pre-agg using any of the supported options. My setup with the cube store following these docs

docker-compose.yml

version: '2.2'
services:
  cubestore:
    image: cubejs/cubestore:edge

  cube:
    image: cubejs/cube:latest
    ports:
      # 4000 is a port for Cube.js API
      - 4000:4000
      # 3000 is a port for Playground web server
      # it is available only in dev mode
      - 3000:3000
    env_file: .env
    depends_on:
      - cubestore
    links:
      - cubestore
    volumes:
      - ./schema:/cube/conf/schema

.env

CUBEJS_DEV_MODE=true
CUBEJS_API_SECRET=<omitted>

CUBEJS_LOG_LEVEL=trace

CUBEJS_DB_NAME=<omitted>
CUBEJS_DB_USER=<omitted>
CUBEJS_DB_PASS=<omitted>
CUBEJS_DB_TYPE=snowflake
CUBEJS_DB_SNOWFLAKE_ACCOUNT=<omitted>
CUBEJS_DB_SNOWFLAKE_REGION=<omitted>
CUBEJS_DB_SNOWFLAKE_WAREHOUSE=<omitted>
CUBEJS_DB_SNOWFLAKE_ROLE=<omitted>
CUBEJS_DB_SNOWFLAKE_CLIENT_SESSION_KEEP_ALIVE=false

CUBEJS_EXT_DB_TYPE=cubestore
CUBEJS_EXT_DB_HOST=cubestore

schema/Newsletter.js

...

    preAggregations: {
      main: {
        type: `rollup`,
        measureReferences: [count],
        dimensionReferences: [newsletterName],
        external: true
      },
    },

...
  1. Run a query in Cube Playground that uses the pre-agg Cube_js_Playground

Expected behavior Expected the unloading of the pre-agg table to work without error and for future queries to hit the external pre-agg storage rather than Snowflake.

Additional context Observed that the query in Snowflake for the column names produced no records

Query_Detail__019bbd14-0601-e676-0000-30d10757b482

Did some digging in the source code. It looks like transforming the schema name to uppercase was necessary to pull the tables for automatic schema generation through the playground UI.

line 158 of SnowflakeDriver.js

  async getTablesQuery(schemaName) {
    const tables = await super.getTablesQuery(schemaName.toUpperCase());
    return tables.map(t => ({ table_name: t.TABLE_NAME && t.TABLE_NAME.toLowerCase() }));
  }

The function tableColumnTypes on line 320 of BaseDriver.js seems to implement the query to get the column names. As it is not Snowflake specific, and the SnowflakeDriver doesn't implement this method, it fails to upper case the table name and schema params.

  async tableColumnTypes(table) {
    const [schema, name] = table.split('.');
    const columns = await this.query(
      `SELECT columns.column_name,
             columns.table_name,
             columns.table_schema,
             columns.data_type
      FROM information_schema.columns
      WHERE table_name = ${this.param(0)} AND table_schema = ${this.param(1)}`,
      [name, schema]
    );
    return columns.map(c => ({ name: c.column_name, type: this.toGenericType(c.data_type) }));
  }

The solution may be as simple as implementing the tableColumnTypes method in the SnowflakeDriver to uppercase the params but wanted to check-in on support for Snowflake pre-aggregations more generally and if there might be other, known downstream issues.

ovr commented 3 years ago

I started to work on it.