cube-js / cube

📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
https://cube.dev
Other
18.01k stars 1.78k forks source link

BUG pre-aggregation not selected on many-to-many relationship + measure #8089

Open kriKinesso opened 7 months ago

kriKinesso commented 7 months ago

Problem I am running into a problem with pre-aggregations I have three tables from postgresql :

I have one table from bigQuery :

DATA MODEL:

cube("client", {
  sql: `SELECT * FROM "s_db".clients`,
  dataSource: "pg",
  dimensions: {
    id: {
      sql: "id",
      type: "number",
      primaryKey: true,
    },
    name: {
      sql: "name",
      type: "string",
    },
    updated_at: {
      sql: "updated_at",
      type: "time",
    },
  },
  joins: {
    client_agency: {
      sql: `${CUBE.id} = ${client_agency.client_id}`,
      relationship: `many_to_one`,
    },
  },
  pre_aggregations: {
    main: {
      type: `rollup`,
      external: true,
      dimensions: [CUBE.id, CUBE.name, CUBE.updated_at],
      refresh_key: {
        every: `1 day`,
        sql: `SELECT MAX(updated_at) FROM "s_db".clients`,
      },
      indexes: {
        client_id: {
          columns: [CUBE.id],
        },
      },
    },
  },
});

cube("agency", {
  sql: `SELECT * FROM "s_db".agencies`,
  dataSource: "pg",
  dimensions: {
    id: {
      sql: "id",
      type: "number",
      primaryKey: true,
    },
    name: {
      sql: "name",
      type: "string",
    },
    updated_at: {
      sql: "updated_at",
      type: "time",
    },
  },
  joins: {
    client_agency: {
      sql: `${CUBE.id} = ${client_agency.agency_id}`,
      relationship: `many_to_one`,
    },
  },
  pre_aggregations: {
    main: {
      type: `rollup`,
      external: true,
      dimensions: [CUBE.id, CUBE.name],
      refresh_key: {
        every: `1 day`,
        sql: `SELECT MAX(updated_at) FROM "s_db".agencies`,
      },
      indexes: {
        agency_id: {
          columns: [CUBE.id],
        },
      },
    },
  },
});

cube("client_agency", {
  sql: `SELECT * FROM "s_db".client_agency`,
  dataSource: "pg",
  dimensions: {
    agency_id: {
      sql: "agency_id",
      type: "number",
      primaryKey: true,
    },
    client_id: {
      sql: "client_id",
      type: "number",
      primaryKey: true,
    },
  },
  joins: {
    agency: {
      sql: `${CUBE.agency_id} = ${agency.id}`,
      relationship: `one_to_many`,
    },
    client: {
      sql: `${CUBE.client_id} = ${client.id}`,
      relationship: `one_to_many`,
    },
  },
  pre_aggregations: {
    main: {
      type: `rollup`,
      external: true,
      dimensions: [
        CUBE.agency_id,
        CUBE.client_id
      ],
      measures: [CUBE.count],
      refresh_key: {
        every: `1 day`,
      },
      indexes: {
        agency_id: {
          columns: [CUBE.agency_id],
        },
        client_id: {
          columns: [CUBE.client_id],
        },
      },
    },
  },
});
cube("report", {
  title: "Report",
  name: "report",
  sql: "SELECT * FROM reports.report",
  dataSource: "bq",
  dimensions: {
    date: {
      sql: "CAST(Date as TIMESTAMP)",
      type: "time",
      primaryKey: false,
      shown: true,
    },
    account_id: {
      sql: "account_id",
      type: "string",
      primaryKey: true,
      shown: true,
    },
    client_id: {
      sql: "client_id",
      type: "string",
      primaryKey: true,
      shown: true,
    },
    updated_at: {
      sql: "updated_at",
      type: "time",
      primaryKey: false,
      shown: true,
    },
  },
  measures: {
    impressions: {
      sql: "impressions",
      type: "sum",
    },
    clicks: {
      sql: "clicks",
      type: "sum",
    },
  },
  joins: {
    client: {
      sql: `${CUBE.client_id} = ${client.id}`,
      relationship: "many_to_one",
    },
  },
  pre_aggregations: {
    main: {
      type: "rollup",
      external: true,
      dimensions: [CUBE.date, CUBE.account_id, CUBE.client_id, CUBE.updated_at],
      measures: [CUBE.impressions, CUBE.clicks],
      time_dimension: CUBE.date,
      granularity: "day",
      partition_granularity: "month",
      refresh_key: {
        every: "1 minute",
        sql: `SELECT MAX(updated_at) FROM (SELECT * FROM reports.report) AS report WHERE ${FILTER_PARAMS.report.date.filter(
          "CAST(Date AS TIMESTAMP)"
        )}`,
      },
      indexes: {
        client_id: {
          columns: [CUBE.client_id],
        },
      },
    },
    report_client_client_agency_agency:
      {
        type: `rollup_join`,
        external: true,
        time_dimension: CUBE.date,
        granularity: `day`,
        dimensions: [
          CUBE.date,
          CUBE.account_id,
          CUBE.client_id,
          CUBE.updated_at,
          client.id,
          client.name,
          client.updated_at,
          client_agency.agency_id,
          client_agency.client_id,
          agency.id,
          agency.name,
          agency.updated_at,
        ],
        measures: [
          CUBE.impressions,
          CUBE.clicks
        ],
        rollups: [
          CUBE.main,
          client.main,
          client_agency.main,
          agency.main,
        ],
      },
  },
  rewriteQueries: false,
});

The problem:

  1. If I make a query with dimensions from all 4 tables (no measures), the pre aggreagation is selected (the result is correct)
  2. If I make a query with one measure from report and dimensions from client, the pre aggregation is selected (the result is correct)
  3. If I make a query with one measure from report and dimensions from client AND AGENCY, the pre aggregation is NOT selected, I get the following error:

To join across data sources use rollupJoin with Cube Store. If rollupJoin is defined, this error indicates it doesn't match the query. Please use Rollup Designer to verify it's definition. Found data sources: bq, pg

Problematic query:

{
  "order": {
    "report.date": "asc"
  },
  "dimensions": [
    "report.account_id",
    "agency.name",
    "client.name"
  ],
  "measures": [
    "report.clicks"
  ]
}

But if I use Rollup Designer it tells me to add the metric to the pre-aggregation, just like I already have.

igorlukanin commented 7 months ago

Hi @kriKinesso 👋

My suspicion is that referencing multiple (too many?) pre-aggregations somehow breaks the query matching resolution:

        rollups: [
          CUBE.main,
          client.main,
          client_agency.main,
          agency.main,
        ],

Could you please try reworking your pre-aggregations in a way that there are only two pre-aggregations there, one from bq and one from pg?

kriKinesso commented 7 months ago

Hi @igorlukanin 👋, thank you for your answer.

So is there a limit number for the pre aggregations you can reference in a rollup join?

In my case I have to use pre-aggregations for this sort of queries, I absolutely can not allow users to query the datasource directly. If I remove any of the pre-aggregations from rollups the query won't match.

Also, I have other pre-aggregations, with a larger number of rollups that work well. This error comes up only in the case of many to many relationships, which leads me to think that either we are dealing with a bug, or pre-aggregations are not meant to be used with many to many relationships.

igorlukanin commented 7 months ago

In my case I have to use pre-aggregations for this sort of queries, I absolutely can not allow users to query the datasource directly. If I remove any of the pre-aggregations from rollups the query won't match.

Well, I don't think that you should not use pre-aggregations. However, I feel like you might try combining 3 pre-aggregations on the pg data source (that you reference in the rollup join) into one. And then reference only that one in the rollup join.

This error comes up only in the case of many to many relationships, which leads me to think that either we are dealing with a bug, or pre-aggregations are not meant to be used with many to many relationships.

I can't spot the issue by looking at your data model. Let me tag @paveltiunov in case he sees anything suspicions.

kriKinesso commented 7 months ago

I tried grouping the pre-aggregations, and I still get the same error, so unfortunately that is not the issue.