cube-js / cube

📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
https://cube.dev
Other
17.96k stars 1.78k forks source link

How to configure Cube.js pre-aggregations to refresh every 12 hours or at a specific time (e.g., 4:00 AM UTC) and respect timing? #8869

Open abdullahelaaroub opened 3 weeks ago

abdullahelaaroub commented 3 weeks ago

Problem

I’m working on a Cube.js project where I need to configure pre-aggregations to refresh at regular intervals, specifically every 12 hours or at a specific time like 4:00 AM UTC. I’ve set up a cube to optimize query performance using pre-aggregations, but I'm unsure how to configure the refresh schedule to achieve this behavior.

When testing the last refresh using Postman, for example, checking every 5 minutes, Cube.js doesn’t seem to respect the scheduled refresh time , even though I’ve set it for every 5 minutes. The refresh happens inconsistently.

Here’s an example of my Orders cube where I want to apply these configurations:

cube('Orders', {
  sql: `SELECT id, user_id, total_price, added_at FROM public.orders`,

  preAggregations: {
    OrdersSummary: {
      dimensions: [Orders.id, Orders.userId, Orders.totalPrice],
      refreshKey: {
        every: '12 hour'
      }
    },
    OrdersDetails: {
      dimensions: [Orders.id, Orders.userId, Orders.addedAt],
      refreshKey: {
        every: '12 hour'
      }
    }
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primaryKey: true
    },
    userId: {
      sql: `user_id`,
      type: `number`
    },
    totalPrice: {
      sql: `total_price`,
      type: `number`
    },
    addedAt: {
      sql: `added_at`,
      type: `time`
    }
  }
});
igorlukanin commented 2 weeks ago

Hi @abdullahelaaroub 👋

It is exactly correct to use refresh_key for that. You can see more in the docs here: https://cube.dev/docs/reference/data-model/pre-aggregations#every. Also, note that you can use cron expressions to specify a refresh interval: https://cube.dev/docs/reference/data-model/cube#supported-cron-formats

Cube.js doesn’t seem to respect the scheduled refresh time , even though I’ve set it for every 5 minutes. The refresh happens inconsistently.

What behavior do you observe? How often does the refresh happen?

(FYI, it is not guaranteed that the refresh will start at a particular time sharp. However, it should start fairly shortly after the expected time.)

abdullahelaaroub commented 2 weeks ago

Hi @igorlukanin 👋

I've set up my Cube as follows (this is just a exemple of cube):

cube('Orders', {
  sql: `SELECT id, user_id, total_price, added_at FROM public.orders`,
  refreshKey: {
    every: "0 1 * * *",  
    timezone: "UTC"
  },
  preAggregations: {
    OrdersSummary: {
      dimensions: [Orders.id, Orders.userId, Orders.totalPrice],
      type: 'rollup',
      refreshKey: {
        every: '6 hour',
        updateWindow: '1 day',
      },
      scheduledRefresh: true,
    },
    OrdersDetails: {
      dimensions: [Orders.id, Orders.userId, Orders.addedAt],
      type: 'rollup',
      refreshKey: {
        every: '6 hour',
        updateWindow: '1 day',
      },
      scheduledRefresh: true,
    }
  },
  dimensions: {
    id: { sql: `id`, type: `number`, primaryKey: true },
    userId: { sql: `user_id`, type: `number` },
    totalPrice: { sql: `total_price`, type: `number` },
    addedAt: { sql: `added_at`, type: `time` }
  }
});

In my docker-compose.yml, I configure the environment like this:

services:
  cube:
    image: 'cubejs/cube:v1.0.2'
    restart: unless-stopped
    ports:
      - "4000:4000"
    volumes:
      - ./config:/cube/conf
    environment:
      - CUBEJS_CUBESTORE_HOST=cubestore
      - CUBEJS_LOG_LEVEL=error
      - CUBEJS_DEV_MODE=true
      - CUBEJS_SCHEDULED_REFRESH=true
      - CUBEJS_SCHEDULED_REFRESH_TIMER=true
    env_file:
      - database.env
    links:
      - cubestore:cubestore
    depends_on:
      - cube-refresh-worker
      - cubestore
  cube-refresh-worker:
    image: 'cubejs/cube:v1.0.2'
    restart: unless-stopped
    volumes:
      - ./config:/cube/conf
    environment:
      - CUBEJS_CUBESTORE_HOST=cubestore
      - CUBEJS_LOG_LEVEL=error
      - CUBEJS_REFRESH_WORKER=true
      - CUBEJS_SCHEDULED_REFRESH=true
      - CUBEJS_SCHEDULED_REFRESH_TIMER=true
      - CUBEJS_DEV_MODE=true
      - CUBEJS_REFRESH_INTERVAL=21600000 # 6 hours in milliseconds

Even after adding scheduledRefreshTimer: 3600 and scheduledRefreshTimeZones: ['UTC'] to module.exports in my Cube.js server, the refresh behavior remains inconsistent. The data still doesn’t refresh at the specified time, nor shortly after; instead, it takes days to weeks to update.