Describe the bug
Using Cube Cloud, I think there might be something wrong with the pre-aggregation warm up instances.
I have a very simple scheduled_refresh_contexts in my cube.py, which depends on the databricks SDK.
This runs fine on my worker on API instances, but not on my pre-aggregation warm up instances
It's a little but hard to debug, because the pre-aggregation warm up instance only seems to exist for a fraction of a second, maybe because it fails immediately. I did manage to get a screenshot
I can definitely see that at least in my build job, the databricks.sdk is installed
To Reproduce
Steps to reproduce the behavior:
Define your requirements.txt to install databricks-sdk
databricks-sdk
Define a scheduled_refresh_contexts which depends on databricks in cube.py
from cube import config
from databricks.sdk import WorkspaceClient
# Fetch the list of schemas within the environment's catalog
catalog_name = os.environ.get('CUBEJS_DB_DATABRICKS_CATALOG')
schemas = databricks_workspace_client.schemas.list(catalog_name=catalog_name)
# ...
return security_contexts_array
3. Enable pre-aggregation warm up in cube cloud
**Expected behavior**
- dependencies from requirements.txt get installed before any instance run
- After the env vars update on cube cloud, all contexts defined by scheduled_refresh_contexts should compile and pre-aggregate, any query hitting a pre-aggregation should pass
**Actual behavior**
- This runs fine on my worker on API instances, but not on my pre-aggregation warm up instances
- It's a little but hard to debug, because the pre-aggregation warm up instance only seems to exist for a fraction of a second, but when I do catch it, it says that databricks-sdk is not installed
- I can definitely see that at least in my build job, the databricks.sdk is installed
- The result is that NO pre-aggregations get built, unless the refresh_key triggers it, which can take time and leave the instance broken for extended periods of time
**Screenshots**
![Screenshot 2024-10-18 at 1 12 42 PM](https://github.com/user-attachments/assets/647d3515-e6a8-4775-9eac-34034051f64a)
![Screenshot 2024-10-18 at 1 14 27 PM](https://github.com/user-attachments/assets/9e984334-47c0-409b-b739-45af6399f6c1)
![Screenshot 2024-10-18 at 1 15 44 PM](https://github.com/user-attachments/assets/fc114203-b2b6-42c7-be1c-063d782b84d2)
**Minimally reproducible Cube Schema**
Adding a cut out from my schema, but I don't think this is schema dependent. The important part is the requirements.txt and cube.py posted above
```yaml
cubes:
- name: gold_journal_lines
sql_table: "{{ COMPILE_CONTEXT.securityContext.company_id | safe }}.gold__journal_lines"
dimensions:
- name: id
sql: id
type: string
primary_key: true
- name: net_amount
sql: net_amount
type: number
- name: posted_on
sql: posted_on
type: time
measures:
- name: sum_net_amount
type: sum
sql: net_amount
pre_aggregations:
# Rollup Pre-aggregation with accounts and counterparties
- name: journal_line_acc_cpt_rollup
measures:
- gold_journal_lines.sum_net_amount
time_dimension: CUBE.posted_on
granularity: month
partition_granularity: year
Describe the bug Using Cube Cloud, I think there might be something wrong with the pre-aggregation warm up instances.
To Reproduce Steps to reproduce the behavior:
requirements.txt
to install databricks-sdkscheduled_refresh_contexts
which depends on databricks incube.py
...
@config('scheduled_refresh_contexts') def scheduled_refresh_contexts() -> list[object]: databricks_workspace_client = WorkspaceClient( host = os.environ.get('DATABRICKS_HOST'), token = os.environ.get('CUBEJS_DB_DATABRICKS_TOKEN') )
Version: Tried with 0.35.55, 1.0.1, 1.1.0
Happy to provide any additional details