cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.94k stars 1.78k forks source link

Nested Aggregation using Javascript Does Not Work #8140

Closed danfaro closed 7 months ago

danfaro commented 7 months ago

Describe the bug I'm trying to replicate the nested aggregation function using javascript https://cube.dev/docs/guides/recipes/data-modeling/nested-aggregates. However the page's sample code is only in YAML.

When I recreate the example nested aggregation data model in a JS file following the js syntax shown in (https://cube.dev/docs/product/data-modeling/concepts/subquery-dimensions), Cube reports the following error:

Screenshot 2024-04-12 at 3 29 27 PM

It could be one of three issue, my syntax is wrong, the syntax in the docs are wrong or cube is not handling js correctly. I've checked things repeated with the docs and I'm not able to see what I'm doing wrong.

To Reproduce Steps to reproduce the behavior: Recreate nested aggregation model in js file.

Expected behavior Be able to test in play ground without errors

Screenshots If applicable, add screenshots to help explain your problem.

Minimally reproducible Cube Schema In case your bug report is data modelling related please put your minimally reproducible Cube Schema here. You can use selects without tables in order to achieve that as follows.

cube(`nested_agg_sales_js`, {
  sql: `
      SELECT 1 AS id, 1 AS store_id, 1 AS product_id, 10 AS sales UNION ALL
      SELECT 2 AS id, 1 AS store_id, 1 AS product_id, 20 AS sales UNION ALL
      SELECT 3 AS id, 1 AS store_id, 2 AS product_id, 30 AS sales UNION ALL
      SELECT 4 AS id, 1 AS store_id, 2 AS product_id, 40 AS sales UNION ALL
      SELECT 5 AS id, 2 AS store_id, 1 AS product_id, 50 AS sales UNION ALL
      SELECT 6 AS id, 2 AS store_id, 1 AS product_id, 60 AS sales UNION ALL
      SELECT 7 AS id, 2 AS store_id, 2 AS product_id, 70 AS sales UNION ALL
      SELECT 8 AS id, 2 AS store_id, 2 AS product_id, 80 AS sales`,

  measures: {
    sales: {
      sql: `sales`,
      type: `sum`,
    },
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true,
    },
    store_id: {
      sql: `store_id`,
      type: `number`,
    },
    product_id: {
      sql: `product_id`,
      type: `number`,
    },
    store_product_id: {
      sql: `CONCAT(${store_id}, '-', ${product_id})`,
      type: `string`,
    }
  },
});

cube(`nested_agg_stores_orders_js`, {
  sql: `SELECT store_id, product_id
      FROM (
        SELECT 1 AS id, 1 AS store_id, 1 AS product_id, 10 AS sales UNION ALL
        SELECT 2 AS id, 1 AS store_id, 1 AS product_id, 20 AS sales UNION ALL
        SELECT 3 AS id, 1 AS store_id, 2 AS product_id, 30 AS sales UNION ALL
        SELECT 4 AS id, 1 AS store_id, 2 AS product_id, 40 AS sales UNION ALL
        SELECT 5 AS id, 2 AS store_id, 1 AS product_id, 50 AS sales UNION ALL
        SELECT 6 AS id, 2 AS store_id, 1 AS product_id, 60 AS sales UNION ALL
        SELECT 7 AS id, 2 AS store_id, 2 AS product_id, 70 AS sales UNION ALL
        SELECT 8 AS id, 2 AS store_id, 2 AS product_id, 80 AS sales
      ) AS raw
      GROUP BY 1, 2`,

  joins: {
    nested_agg_sales_js: {
      relationship: `one_to_many`,
      sql: `${CUBE}.store_product_id = ${nested_agg_sales_js.store_product_id}`,
    },
  },

  measures: {
    median_sales: {
      sql: `PERCENTILE_CONT(0.5) WITHIN GROUP (ORDER BY ${sales_sum})`,
      type: `number`,
    },
  },

  dimensions: {
    store_id: {
      sql: `store_id`,
      type: `number`,
    },
    product_id: {
      sql: `product_id`,
      type: `number`,
    },
    store_product_id: {
      sql: `CONCAT(${store_id}, '-', ${product_id})`,
      type: `string`,
      primary_key: true,
    },
    sales_sum: {
      sql: `${nested_agg_sales_js.sales}`,
      type: `number`,
      sub_query: true,
    }
  },
});

Version: cube cloud

Additional context

igorlukanin commented 7 months ago

Hey @danfaro 👋

It could be one of three issue, my syntax is wrong, the syntax in the docs are wrong or cube is not handling js correctly. Let's see 😄

Here's how the join is defined in the code example in the docs:

    joins:
      - name: nested_agg_sales
        sql: "{nested_agg_stores_orders.store_product_id} = {nested_agg_sales.store_product_id}"
        relationship: one_to_many

As you can see, on both sides of SQL dimensions are referenced.

Here's how the join is defined in your code example:

  joins: {
    nested_agg_sales_js: {
      relationship: `one_to_many`,
      sql: `${CUBE}.store_product_id = ${nested_agg_sales_js.store_product_id}`,
    },
  },

As you can see, now you have a column reference on the left side of the join. If you update it back to a dimension reference, it should work again.

I hope this helps.

danfaro commented 7 months ago

Thanks Igor, your suggestion worked to fix the data model. I changed the sql from ${CUBE}.store_product_id to ${CUBE.store_product_id}.

I was following the subquery dimensions documentation that uses the column reference without understanding the intricacies of the member syntax doc that you linked.

Screenshot 2024-04-12 at 4 37 17 PM

A related question: can I do multiple levels of nested aggregation? Cube1 -(Sum)-> Cube2 -(Avg)-> Cube3 -(Max)-> Cube4.

I tried to implement it, but got an error message that like this:

Screenshot 2024-04-12 at 4 43 13 PM
igorlukanin commented 7 months ago

I think you should be able to do that. Every additional level of nested aggregation is just an auxiliary cube with the desired measure + a reference to it in a subquery dimension of a previous cube.

The error message looks like it was returned by the database. You might have an error in the SQL, maybe in the sql of your auxiliary cubes. (You can always simplify and test with a "dummy" cube: sql as SELECT 123, join on 1=1.)