cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.51k stars 1.74k forks source link

Preaggregations are not working from Metabase #8210

Open Betilopeza opened 2 months ago

Betilopeza commented 2 months ago

Hello! we are testing preaggregations to improve the latency for the questions to appear in Metabase dashboards. We are currently using:

For some reason, when we create a preaggregation with measures that worked perfectly fine in our playground, but when queried from Metabase, are not preaggregated as shown below in the example.

image image image

The dynamic schema for that table is this one:

{% set account = "accountUuid" %}

{% set interaction = "interactionUuid" %}
{% set time = "time" %}
cubes:
  {%- for cube in load_interaction_tags()["cubes"] %}

  - name: {{ cube.name }}
    sql: {{ cube.sql }}
    dataSource: default

  {%- if cube.dimensions is not none and cube.dimensions|length > 0 %}
    pre_aggregations:
      - name: {{cube.name + "_FullTable" }} 
        dimensions:
          {%- for dimension in cube.dimensions %} 
          {%- if dimension.name != account and  dimension.name != interaction %}
          - {{ dimension.name }}
          {%- endif %}
          {%- endfor %}
          - InteractionLocations.createdAt
          - InteractionLocations.isDone
          - InteractionLocations.url
          - MGMLocations.Active_Status
          - MGMLocations.HOA_Type
          - MGMLocations.Plan_Type
          - MGMLocations.name
          - MGMLocations.Account_Manager
        refresh_key:
          every: 1 hour

      - name: {{cube.name + "_LocationInteraction3" }} 
        dimensions:
          {%- for dimension in cube.dimensions %} 
          {%- if dimension.name != account and  dimension.name != interaction %}
          - {{ dimension.name }}
          {%- endif %}
          {%- endfor %}
          - InteractionLocations.createdAt
          - InteractionLocations.url
          - MGMLocations.Active_Status
          - MGMLocations.HOA_Type
          - MGMLocations.Plan_Type
          - MGMLocations.name
          - MGMLocations.Account_Manager
        refresh_key:
          every: 1 hour

      - name: {{cube.name + "_LocationInteraction4" }} 
        dimensions:
          {%- for dimension in cube.dimensions %} 
          {%- if dimension.name != account and  dimension.name != interaction %}
          - {{ dimension.name }}
          {%- endif %}
          {%- endfor %}
          - InteractionLocations.createdAt
          - InteractionLocations.url
          - MGMLocations.name
          - MGMLocations.Account_Manager
        refresh_key:
          every: 1 hour

      - name: {{cube.name + "_table" }} 
        dimensions:
          {%- for dimension in cube.dimensions %} 
          {%- if dimension.name != account and  dimension.name != interaction %}
          - {{ dimension.name }}
          {%- endif %}
          {%- endfor %}
        refresh_key:
          every: 1 hour

      {%- for dimension in cube.dimensions %}

      - name: {{ dimension.name + "_filter" }} 
        dimensions:
          - {{ dimension.name }}
        refresh_key:
          every: 1 hour

      - name: {{ cube.name + "_distinctUuid" }} 
        measures:
          - distinctUuid
        dimensions:
          {%- if dimension.name != account and  dimension.name != interaction %}
          - {{ dimension.name }}
          {%- endif %}
        refresh_key:
          every: 1 hour

      {%- endfor %}

      - name: {{ cube.name + "_distinctUuid" }} 
        dimensions:
          {%- for dimension in cube.dimensions %} 
          {%- if dimension.name != account and  dimension.name != interaction %}
          - {{ dimension.name }}
          {%- endif %}
          {%- endfor %}
        measures:
          - distinctUuid
        refresh_key:
          every: 1 hour

      - name: {{ cube.name + "_distinct" }} 
        measures:
          - distinctUuid
  {%- endif %}

  {%- if cube.joins is not none and cube.joins|length > 0 %}
    joins:
      {%- for join in cube.joins %}
      - name: {{ join.name }}
        relationship: {{ join.relationship }}
        sql: {{ join.sql }}
      {%- endfor %}
  {%- endif %}

  {%- if cube.measures is not none and cube.measures|length > 0 %}
    measures:
      {%- for measure in cube.measures %}
      - name: {{ measure.name }}
        type: {{ measure.type }}
      {%- if measure.sql %}
        sql: {{ measure.sql }}
      {%- endif %}
      {%- endfor %}
  {%- endif %}

  {%- if cube.dimensions is not none and cube.dimensions|length > 0 %}
    dimensions:
      {%- for dimension in cube.dimensions %}
      - name: {{ dimension.name }}
        sql: {{ dimension.sql }}
        type: {{ dimension.type }}
        {% if dimension.primaryKey == True -%}
        primaryKey: true
        public: true
        {% endif -%}
      {%- endfor %}
  {%- endif %}
  {%- endfor %}

And this is our cube.js

const {memoizedFetchAccountPassword} = require("./sql-auth")

function decodeBase64(data) {
    let buff = Buffer.from(data, 'base64');
    return buff.toString('ascii');
}

const deconstructGlobalId = (globalId) => {
    const decoded = decodeBase64(globalId)
    const globalIdParts = decoded.split(":")

    return {
        node: globalIdParts[0],
        uuid: globalIdParts[1]
    }
}

const contextToAppId = (context) => `CUBEJS_APP_${context.cacct}`

const extendContext = (req) => {
  // For SQL API (context is then handled by checkSqlAuth)
    if (req.headers === undefined || req.headers.cacct === undefined) {
        return;
    }

    try {
        const deconstructed = deconstructGlobalId(req.headers.cacct);
        return {cacct: deconstructed["uuid"]};
    } catch (err) {
        console.log("Error extending context: " + err)
    }
}

const queryRewrite = (query, request) => {
    console.log("Rewriting for", JSON.stringify(request))
    const accountId = request.cacct !== undefined ? request.cacct : request.securityContext.cacct;

    if (query["dimensions"].length > 0) {
        try {
            var table_name = query["dimensions"][0].substring(0, query["dimensions"][0].indexOf("."));
            } catch {
            var table_name = query["dimensions"][0]["cubeName"]
            }   
    }
    // measures in query
    else if (query["measures"].length > 0) {
        try {
            var table_name = query["measures"][0].substring(0, query["measures"][0].indexOf("."))
            } catch {
            var table_name = query["measures"][0]["cubeName"]
            }
    }

    else if (query["timeDimensions"].length > 0) {
        try {
        var table_name = query["timeDimensions"][0]["dimension"].substring(0, query["timeDimensions"][0]["dimension"].indexOf("."))
        } catch {
            var table_name = query["measures"][0]["cubeName"]
        }
    }

    else {
        console.log("Using account id:", accountId)
        console.log("The query:", query)
    }

    console.log("Using account id:", accountId);

    query.filters.push({
        member: `${table_name}.accountUuid`,
        operator: 'equals',
        values: [accountId],
    });

    return query;
}

const checkSqlAuth = async (req, username) => {
    // Remove this section for containing sensitive information
}

const queueOptions = {
  concurrency: 2,
  executionTimeout: 600,
  orphanedTimeout: 120,
  heartBeatInterval: 120,
};

module.exports = {
    http: {
        cors: {
            origin: '*',
            methods: 'GET,HEAD,PUT,PATCH,POST,DELETE',
            preflightContinue: false,
            allowedHeaders: ['Content-Type', 'Authorization', 'cacct'],
            optionsSuccessStatus: 204,
        },
    },
    scheduledRefreshTimer: 120,
    checkSqlAuth,
    contextToAppId,
    extendContext,
    queryRewrite,
    orchestratorOptions: {
        queryCacheOptions: {
            refreshKeyRenewalThreshold: 120,
            backgroundRenew: true,
            queueOptions,
        },
        preAggregationsOptions: {queueOptions},
    },
    // Figure out ScheduledRefreshContexts later! (probably needed for QueryRewrite)
    // Placeholder to prevent the error message:
    scheduledRefreshContexts: () => [
        {
            securityContext: {
                cacct: '00000000-0000-0000-0000-000000000000'
            }
        }
    ],
};

Please if you can check if we are doing something wrong, or if we are missing something, would be really helpful

paveltiunov commented 2 months ago

@Betilopeza It seems distinctUuid is a measure however you're using it in GROUP BY. I expect to see it inside COUNT(DISTINCT distinctUuid) aggregation reference. Could you please elaborate on your use case?

Betilopeza commented 2 months ago

Hello, yes we tried in the group by, because in the measure sections the preaggregation was not working either! I send you the screenshots. Preaggregation:

image (21) image (22) image (23)

paveltiunov commented 2 months ago

@Betilopeza There're dimensions in this pre-aggregation and this pre-aggregation is non-additive. So it couldn't match a query without dimensions defined. Please see https://cube.dev/docs/product/caching/matching-pre-aggregations.

Betilopeza commented 2 months ago

We are trying to make work this preaggregation: Preaggregation:

As you can see it actually doesn´t have any dimension, so it´s happenning what I send you in the previous message. In playground works perfectly, but not in metabase. I did't get why you said that we have dimensions

paveltiunov commented 2 months ago

@Betilopeza Unfortunately we can't reproduce it on our side. If you're in Cube Cloud please prepare a branch with a description on how to reproduce and file a ticket with the support team. We'll take a look.

Betilopeza commented 2 months ago

Thanks! Unfortunately, we haven't been able to reach support since February, so I don't think that would be an option.

igorlukanin commented 1 month ago

Hi @Betilopeza, you can send the link to your branch with a reproduction and any relevant information to me via igor@cube.dev.