cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.82k stars 1.76k forks source link

{cube.sql()} is not supported in YAML/Python #7484

Open wasd171 opened 10 months ago

wasd171 commented 10 months ago

Is your feature request related to a problem? Please describe. Extending cubes brings great code reusability. However, it is currently not supported for the Python cubes. Look at these 2 cubes:

# cubes/applications.yml
{% set model = dbt_model('applications') %}

cubes:
  - name: "cube_{{ model.name | safe }}"
    sql: >
      SELECT * FROM {{ model.sql_table | safe }}
    public: false

    dimensions:
      {{ model.as_dimensions() }}

    # Model-specific measures
    measures:
      - name: total_count
        type: count
# cubes/applications_sent.yml
cubes:
  - name: cube_applications_sent
    extends: cube_applications
    sql: >
      SELECT * FROM { cube_applications.sql() } WHERE applied_at is not null
    public: false

According to the documentation, this could work, but it gives me the following error:

Error: Compile errors:
Can't parse python expression. Most likely this type of syntax isn't supported yet: Unsupported Python Trailer children node: TrailerContext: ()
    at ErrorReporter.throwIfAny (/cube/node_modules/@cubejs-backend/schema-compiler/src/compiler/ErrorReporter.ts:133:13)
    at DataSchemaCompiler.throwIfAnyErrors (/cube/node_modules/@cubejs-backend/schema-compiler/src/compiler/DataSchemaCompiler.js:213:23)
    at /cube/node_modules/@cubejs-backend/schema-compiler/src/compiler/DataSchemaCompiler.js:108:16
    at CompilerApi.getCompilers (/cube/node_modules/@cubejs-backend/server-core/src/core/CompilerApi.js:66:26)
    at CompilerApi.scheduledPreAggregations (/cube/node_modules/@cubejs-backend/server-core/src/core/CompilerApi.js:189:31)
    at RefreshScheduler.roundRobinRefreshPreAggregationsQueryIterator (/cube/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:474:38)
    at /cube/node_modules/@cubejs-backend/server-core/src/core/RefreshScheduler.ts:595:12
    at async Promise.all (index 0)

If I modify the applications_sent cube's SQL to SELECT * FROM { cube_applications.sql } WHERE applied_at is not null then I can see that the cube_applications.sql is injected as a JS arrow function () = > query.cubeSql(cube.cubeName()) – obviously Python cannot evaluate it

Describe the solution you'd like I would like to be able to extend Python cubes the same way as it is possible for the YAML / JS ones

Describe alternatives you've considered Since I am preparing my data in dbt I can refactor the applications_sent cube to

# cubes/applications_sent.yml
{% set model = dbt_model('applications') %}

cubes:
  - name: cube_applications_sent
    extends: cube_applications
    sql: >
      SELECT * FROM {{ model.sql_table | safe }} WHERE applied_at is not null
    public: false

^ this will work but feels a bit hacky

igorlukanin commented 10 months ago

Hey @wasd171 👋 Thanks for reporting this! My understanding is that extends works as it should, the real issue is that {cube.sql()} doesn't work in YAML data models. You can try removing that—everything should just work then. However, I agree that this is something that should be implemented

wasd171 commented 10 months ago

Thanks for a swift answer @igorlukanin

You can try removing that—everything should just work then

But I need some table to select from and to apply my where filter. How could I do it without { cube.sql() }?

igorlukanin commented 10 months ago

As a workaround, I can recommend to put that table name in a variable, then use it in both cubes. In the meantime, we should definitely make {cube.sql()} work.

ZiggiZagga commented 6 months ago

Any news on {cube.sql()} issue?

igorlukanin commented 6 months ago

@ZiggiZagga Supporting this is on the roadmap, however, there's currently no estimate as to when this might be done.

ZiggiZagga commented 5 months ago

Hallo @igorlukanin,

I am learning cube and how it works. I have now come across the concept of Data Blending:

I am looking at the example: https://cube.dev/docs/product/data-modeling/concepts/data-blending

cube(`all_sales`, {
  sql: `
    SELECT
      amount,
      user_id AS customer_id,
      created_at,
      'online' AS row_type
    FROM (${online_orders.sql()}) AS online --source cube via literal template
    UNION ALL
    SELECT
      amount,
      customer_id,
      created_at,
      'retail' AS row_type
    FROM (${retail_orders.sql()}) AS retail --source cube via literal template
`,
});

I was wondering isn't it more intuitiv if we had an addition property called "cube_sql" which works as follows

cube(`all_sales`, {
  -- new property "cube_sql"
  cube_sql: `
    SELECT
      amount,
      user_id AS customer_id,
      created_at,
      'online' AS row_type
    FROM online_orders AS online --source cube via cube id
    UNION ALL
    SELECT
      amount,
      customer_id,
      created_at,
      'retail' AS row_type
    FROM retail_orders AS retail --source cube via cube id
`,
});

this way, as a developer i dont need to call cube.sql() because it will be treated behind the scenes. All I need to do is refer to cubes by their identifiers.

What do you think is this possible?

igorlukanin commented 5 months ago

Hi @ZiggiZagga, thanks for the code example!

Frankly, I don't see it as an improvement over the current situation.

First, it doesn't simplify the definition of data model: instead of {cube.sql()} one would need to write cube_sql. No big win syntax-wise.

Second, it introduces additional ambiguity: without checking which parameter is used, one can never be sure if some_table_or_cube in the SQL is a reference to a SQL table or to a cube.

I also doubt that bare some_table_or_cube references would work reliably at all times: now Cube lets you define cubes with names like from or where; would you like these to be treated as cube references in cube_sql as well?

steven-luabase commented 5 months ago

Commenting to follow for updates, would love to see this feature work for .yml models too. Current workaround is a series of lengthy CTEs