cube-js / cube

📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
https://cube.dev
Other
17.96k stars 1.78k forks source link

fix(schema-compiler): fix FILTER_PARAMS to populate set and notSet filters in cube's SQL query #8937

Open sumeetgajjar opened 1 week ago

sumeetgajjar commented 1 week ago

The presence of the following filters in a cube query: set, notSet, eq NULL or NOT eq NULL renders the corresponding FILTER_PARAMS as 1 = 1 irrespective of the cube or filter.

This is because the renderFilterParams method skips invoking filter.conditionSql when converting a filter to its corresponding SQL due to an empty filterParams array.

For set and notSet, the filter cannot have value(s). For eq NULL or NOT eq NULL the null value is filtered by: https://github.com/cube-js/cube/blob/349597a4a1b10f6e76ef3bf18c37986cdd0962f8/packages/cubejs-schema-compiler/src/adapter/BaseFilter.ts#L115

This leads to an empty filterParams array, thus, skipping the filter SQL, thereby returning 1 = 1.


The issue becomes more serious when the query contains multiple filters on the same dimension. Consider the following example:

Cube

sql: >
  SELECT
    *
  FROM
    my_partitioned_table
  WHERE
    {FILTER_PARAMS.MyCube.creation_time.filter('creation_time')}

Query

{
    "or":
    [
        {
            "member": "MyCube.creation_time",
            "operator": "afterDate",
            "values":
            [
                "2024-10-31T21:00:00.000Z"
            ]
        },
        {
            "member": "MyCube.creation_time",
            "operator": "notSet"
        }
    ]
}

In the above case, the rendered query would be of the form:

SELECT
  *
FROM
    my_partitioned_table
  WHERE
    (creation_time >=  '2024-10-31T21:00:00.000Z' or (1 = 1))

The right branch of the OR predicate will always evaluate TRUE, thus, the majority of the SQL optimizers will remove the LEFT branch altogether to prevent unnecessary evaluations, thereby defeating the purpose of the FILTER_PARAMS filter pushdown.


This PR aims to fix the bug by restructuring the nested if-else block, thereby also improving the readability of the same.

Check List

vercel[bot] commented 1 week ago

The latest updates on your projects. Learn more about Vercel for Git ↗︎

8 Skipped Deployments | Name | Status | Preview | Comments | Updated (UTC) | | :--- | :----- | :------ | :------- | :------ | | **examples-angular-dashboard** | ⬜️ Ignored ([Inspect](https://vercel.com/cube-dev/examples-angular-dashboard/65x3sgZcG1K5xVZmkHtVpGrRiRMx)) | [Visit Preview](https://examples-angular-dashboard-git-fork-sumeetgajja-257441-cube-dev.vercel.app) | | Nov 11, 2024 0:16am | | **examples-react-d3** | ⬜️ Ignored ([Inspect](https://vercel.com/cube-dev/examples-react-d3/26CBY2Z5SubY9HTJUxFoCYgzQBVV)) | [Visit Preview](https://examples-react-d3-git-fork-sumeetgajjar-fixsetn-28334c-cube-dev.vercel.app) | | Nov 11, 2024 0:16am | | **examples-react-dashboard** | ⬜️ Ignored ([Inspect](https://vercel.com/cube-dev/examples-react-dashboard/7EHzw2cWnCjdaqKxNgZ334oXLaw5)) | [Visit Preview](https://examples-react-dashboard-git-fork-sumeetgajjar-5f2b01-cube-dev.vercel.app) | | Nov 11, 2024 0:16am | | **examples-react-data-table** | ⬜️ Ignored ([Inspect](https://vercel.com/cube-dev/examples-react-data-table/Cz54oVRSt4e7N9825PpXm6UEN12U)) | [Visit Preview](https://examples-react-data-table-git-fork-sumeetgajjar-2db901-cube-dev.vercel.app) | | Nov 11, 2024 0:16am | | **examples-react-highcharts** | ⬜️ Ignored ([Inspect](https://vercel.com/cube-dev/examples-react-highcharts/AuvdwWEvyY47zk93casd3Pxq9J8D)) | [Visit Preview](https://examples-react-highcharts-git-fork-sumeetgajjar-2489d8-cube-dev.vercel.app) | | Nov 11, 2024 0:16am | | **examples-react-material-ui** | ⬜️ Ignored ([Inspect](https://vercel.com/cube-dev/examples-react-material-ui/3fQxFZonCb1DQpMW71kgLLe6VLCD)) | [Visit Preview](https://examples-react-material-ui-git-fork-sumeetgajja-f2b591-cube-dev.vercel.app) | | Nov 11, 2024 0:16am | | **examples-react-pivot-table** | ⬜️ Ignored ([Inspect](https://vercel.com/cube-dev/examples-react-pivot-table/AuECvA7GYK34rtytuAVhrSpn5Amt)) | [Visit Preview](https://examples-react-pivot-table-git-fork-sumeetgajja-c552c3-cube-dev.vercel.app) | | Nov 11, 2024 0:16am | | **examples-vue-query-builder** | ⬜️ Ignored ([Inspect](https://vercel.com/cube-dev/examples-vue-query-builder/5tyhHcPBU6ovCQtkFLSoganKKHfY)) | [Visit Preview](https://examples-vue-query-builder-git-fork-sumeetgajja-c28526-cube-dev.vercel.app) | | Nov 11, 2024 0:16am |
sumeetgajjar commented 1 week ago

Hi @KSDaemon, @RusovDmitriy, @paveltiunov - can you please review this PR?

Thanks in advance.

KSDaemon commented 1 week ago

Hi @sumeetgajjar Thank you for contributing! I'll have a look!

sumeetgajjar commented 1 week ago

3 tests under postgres/pre-aggregations.test.js are failing.

This is primarily because the visitors cube during the pre-aggregate generation uses the following SQL:

select * from visitors WHERE ((created_at >= undefined AND created_at <= undefined))\n' +
            '      AND (1 = 1)\n' +

Earlier (before this change) SQL was

select * from visitors WHERE ((1 = 1))\n' +
            '      AND (1 = 1)\n' +

cube(`visitors`, {
      sql: `
      select * from visitors WHERE ${FILTER_PARAMS.visitors.createdAt.filter('created_at')}
      AND ${FILTER_PARAMS.ReferenceOriginalSql.createdAt.filter('created_at')}
      `,

This makes sense as the time dimension filter values ~are~ may be unavailable during pre-aggregate generation.

sumeetgajjar commented 1 week ago

3 tests under postgres/pre-aggregations.test.js are failing.

This is primarily because the visitors cube during the pre-aggregate generation uses the following SQL:

select * from visitors WHERE ((created_at >= undefined AND created_at <= undefined))\n' +
            '      AND (1 = 1)\n' +

Earlier (before this change) SQL was

select * from visitors WHERE ((1 = 1))\n' +
            '      AND (1 = 1)\n' +
cube(`visitors`, {
      sql: `
      select * from visitors WHERE ${FILTER_PARAMS.visitors.createdAt.filter('created_at')}
      AND ${FILTER_PARAMS.ReferenceOriginalSql.createdAt.filter('created_at')}
      `,

This makes sense as the time dimension filter values ~are~ may be unavailable during pre-aggregate generation.

Given each of the DATE_OPERATORS_Where methods (i.e. inDateRangeWhere, notInDateRangeWhere, etc) exactly knows its required parameters i.e.

One fix could be modifying each DATE_OPERATORS_Where method to return 1 = 1 when any required parameters are undefined.

@KSDaemon - Does the above proposal sound good to you?

sumeetgajjar commented 1 week ago

Hi @sumeetgajjar Thank you for contributing! I'll have a look!

Hi @KSDaemon - gentle ping, can you please take a look?

This bug is directly impacting our production workflows and is leading to incorrect data being shown to the users.

KSDaemon commented 1 week ago

Does the above proposal sound good to you?

That makes sense.

Seems to be good. I relaunched tests.

sumeetgajjar commented 1 week ago

👍🏻 LGTM!

Thanks for the review @KSDaemon.

A couple of questions,

sumeetgajjar commented 6 days ago

Hi @KSDaemon, @paveltiunov, @igorlukanin - can you please help us merge this PR given it is already approved 😄?