cube-js / cube

📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
https://cube.dev
Other
18.02k stars 1.78k forks source link

Bug with subquery and JOIN operation within filters #8081

Open itestyoy opened 8 months ago

itestyoy commented 8 months ago

A cube crashes when we use field_2 as a filter but works well when we use field_2 as dimensions.

Steps to reproduce the behavior:

  1. Run a query with filters:
{
  "filters": [
    {
      "member": "model_b.field_2",
      "operator": "set"
    }
  ],
  "dimensions": [
    "model_b.field_2",
    "model_b.cube_unique_id"
  ],
  "order": {
    "model_b.field_2": "asc"
  }
}

and then receive an error:

RangeMaximum call stack size exceeded
  1. Run without filters:
{
  "dimensions": [
    "model_b.field_2",
    "model_b.cube_unique_id"
  ],
  "order": {
    "model_b.field_2": "asc"
  }
}

Minimally reproducible Cube Schema:

asyncModule(async () => {

        cube('model_a', {
            sql: `select * from table`,

            dimensions: {
                field_1: {
                    sql: `field_1`,
                    type: 'string'
                },
            },

        });

        cube('model_b', {
            sql: `select * from table`,

            dimensions: {

                cube_unique_id: {
                    sql: `row_id`,
                    type: 'string',
                    primary_key: true
                },

                field_1: {
                    sql: `sum(${metric_1}) over ()`,
                    type: `number`,
                    sub_query: true,
                    propagate_filters_to_sub_query: true,
                },

                field_2: {
                    sql: `${model_a.field_1}`,
                    type: 'string'
                },

            },

            measures: {
                metric_1: {
                    sql: `metric_1`,
                    type: 'max'
                }
            },

            joins: {
                model_a: {
                    relationship: 'one_to_one',
                    sql: `${CUBE.field_1} = ${model_a.field_1}`
                },

            },

        });
})

Version: 0.35.1

igorlukanin commented 7 months ago

Hi @itestyoy 👋

Well, this is a pretty interesting data model!

  1. If you comment out or remove sub_query: true and propagate_filters_to_sub_query: true, then the RangeMaximum call stack size exceeded error will be gone. I feel like this should not be a subquery dimension at all because you're referencing the measure from the same cube (model_b.metric_1) rather than from another cube there.
  2. model_b.field_1 should probably not be a dimension at all, I can see that it's defined as sum(${metric_1}) over () and that looks like an aggregation to me. Aggregations should go to measures, not dimensions.

I hope this will help you fix this data model.

itestyoy commented 7 months ago

Hi @igorlukanin !

Thank you for the explanation, but

It is not a real model, it's just a simple example demonstrating how to encounter the error without involving all models. If we use metrics from another cube in a subquery, the error will be the same. In our case, we're not using SUM, we're using FIRST_VALUE with a PARTITION BY clause.

My question is not about the model, it's about why it works well with dimensions but throws an error with filters.

igorlukanin commented 7 months ago

Even if this data model is simplified for this example, my point is that it doesn't look sound to me. This is not how subquery dimensions are supposed to be defined:

                field_1: {
                    sql: `sum(${metric_1}) over ()`,
                    type: `number`,
                    sub_query: true,
                    propagate_filters_to_sub_query: true,
                },

If you rework that part, the error will be gone.

Let me also tag @paveltiunov in case he'd have an alternative take on whether this is a sound data model or not.

itestyoy commented 7 months ago

@igorlukanin @paveltiunov Hi!

If you rework that part, the error will be gone.

We reworked this part with an additional cube and we achieve propagate_filters_to_sub_query using ${FILTER_GROUP(FILTER_PARAMS....)}, but it's not a good approach because we need to synchronize FILTER_PARAMS with dimensions.

For me, regardless of whether it looks like a model or not, it seems like a bug. Maybe there are any updates?

igorlukanin commented 6 months ago

@itestyoy The data model that you've initially provided is not correct and it's not supposed to work. You can't have an aggregation in the sql property of a subquery dimension. That is why I don't think there's any bug in Cube here.

itestyoy commented 6 months ago

@igorlukanin But why it works good as dimension? What is the difference between

?

igorlukanin commented 6 months ago

Sorry, I don't understand.

How are select a from b and select a from b where a > 1 related anything that we discussed above?

itestyoy commented 6 months ago

@igorlukanin

This doesn't work

(1)

{
  "filters": [
    {
      "member": "model_b.field_2",
      "operator": "set"
    }
  ],
  "dimensions": [
    "model_b.field_2",
    "model_b.cube_unique_id"
  ],
  "order": {
    "model_b.field_2": "asc"
  }
}

This works (2)

{
  "dimensions": [
    "model_b.field_2",
    "model_b.cube_unique_id"
  ],
  "order": {
    "model_b.field_2": "asc"
  }
}

The only difference is filters with the same field as dimension. If using your explanation, query 2 also will not work, but it does.