cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.72k stars 1.75k forks source link

Orchestrator error: Table already exists | relation does not exist #8546

Open kevinleeTCA opened 1 month ago

kevinleeTCA commented 1 month ago

Problem

1. The cube pre-aggregation fails when client is requesting cube data via API.

Error:

Error: Error during create table: CREATE TABLE ... already exists

Detailed error:

Orchestrator error {"duration":343,"error":"Error: Error during create table: CREATE TABLE dev_pre_aggregations.p_m__teams_p_m_teams_rollup_mgcnktaq_xztohjkr_1jb5je3 (\"p_m__teams__legal_entity_id\" uuid, \"p_m__teams__management_ailorn\" text, \"p_m__teams__management_end_date\" date, \"p_m__teams__management_id\" text, \"p_m__teams__management_start_date\" date, \"p_m__teams__organisation_id\" uuid, \"p_m__teams__property_address\" text, \"p_m__teams__property_type\" text, \"p_m__teams__team_id\" text, \"p_m__teams__team_name\" text): relation \"p_m__teams_p_m_teams_rollup_mgcnktaq_xztohjkr_1jb5je3\" already exists"

2. Error: relation ... does not exist

This also happens sometime when client is requesting cube data via API, looks like it tries to find a cache but ends up being dropped by cube worker.

detailed error:

Orchestrator error {"duration":269,"error":"Error: relation \"dev_pre_aggregations.p_m__teams_p_m_teams_rollup_wdzwa5eo_wiy3pkps_1jb5m97\" does not exist"

We found out that this does not exist is probably related to the ENV VAR CUBEJS_DROP_PRE_AGG_WITHOUT_TOUCH when it is enabled as true, when we disable this, the does not exist issue disapprears, but it has other side effect, i.e., cube API is using deprecated cache when resolving client's API request, causing missing data issues.

Current setup:

CUBEJS_DOCKER_IMAGE_VERSION=v0.34.62
CUBEJS_DB_QUERY_TIMEOUT=60m
CUBEJS_CACHE_AND_QUEUE_DRIVER=memory
CUBEJS_DROP_PRE_AGG_WITHOUT_TOUCH=true
CUBEJS_DEV_MODE=true
CUBEJS_DB_TYPE=redshift
CUBEJS_REFRESH_WORKER=true

Current cube and preaggregation:

cube(`PM_Teams`, {
  sql: `select 
      COALESCE(t.id::text, MD5(leo.organisation_id || '-unallocated')) as team_id, 
      COALESCE(t.name, 'Unassigned') as team_name, 
      leo.organisation_id, 
      leo.legal_entity_id,
      'ailo:propertymanagement:management:' || m.id as management_ailorn,
      m.id::text as management_id,
      concat_ws(', ', ap.unit_street_number || ' ' || ap.street_name, ap.suburb) as property_address,
      m.start_date as management_start_date, 
      m.end_date as management_end_date,
      ap.primary_type as property_type
    from legal_entity_organisations leo 
    join managements m on leo.legal_entity_id = m.managing_entity_id and m.is_draft = false
    join agency_properties ap on ap.id = m.agency_property_id
    left join teams_managements tm on m.id = tm.management_id
    left join teams t on t.id = tm.team_id
    where ${SECURITY_CONTEXT.organisationId.filter('leo.organisation_id')}
    UNION
    select 
      MD5(leo.organisation_id || '-unallocated') as team_id, 
      'Unassigned' as team_name, 
      leo.organisation_id, 
      leo.legal_entity_id,
      'ailo:authz:legalentity:' || leo.legal_entity_id as management_ailorn,
      leo.legal_entity_id::text as management_id,
      null as property_address,
      null as management_start_date, 
      null as management_end_date,
      null as property_type
    from legal_entity_organisations leo 
    where ${SECURITY_CONTEXT.organisationId.filter('leo.organisation_id')}
    `,
  preAggregations: {
    pMTeamsRollup: {
      dimensions: [
        CUBE.teamId,
        CUBE.teamName,
        CUBE.organisationId,
        CUBE.legalEntityId,
        CUBE.managementAilorn,
        CUBE.propertyAddress,
        CUBE.propertyType,
        CUBE.managementId,
        CUBE.managementStartDate,
        CUBE.managementEndDate,
      ],
      indexes: {
        mgIdx: {
          columns: [CUBE.managementAilorn],
        },
      },
      refresh_key: {
        every: `1 hour`,
      }
    }
  },
  joins: {
    GeneralLedger_Income: {
      relationship: `hasMany`,
      sql: `${GeneralLedger_Income.accountOwner} = 'ailo:authz:legalentity:' || ${PM_Teams.legalEntityId} AND 
      (
        ${GeneralLedger_Income.generalLedgerManagement} = ${PM_Teams.managementAilorn}  
      )
      `,
    },
    GeneralLedger_Rent: {
      relationship: `hasMany`,
      sql: `${GeneralLedger_Rent.generalLedgerManagement} = ${PM_Teams.managementAilorn}`,
    },
    Tenancies: {
      relationship: `hasMany`,
      sql: `${Tenancies.managementId} = ${PM_Teams.managementId}`,
    },
  },
  dimensions: {
    teamId: {
      sql: `team_id`,
      type: `string`,
      primaryKey: true,
      shown: true,
    },
    teamName: {
      sql: `team_name`,
      type: `string`,
    },
    legalEntityId: {
      sql:`legal_entity_id`,
      type: `string`,
    },
    organisationId: {
      sql: `organisation_id`,
      type: `string`,
    },
    managementAilorn: {
      sql: `management_ailorn`,
      type: `string`,
    },
    managementId: {
      sql: `management_id`,
      type: `string`,
      primaryKey: true,
      shown: true,
    },
    managementStartDate: {
      sql: `management_start_date`,
      type: `time`,
    },
    managementEndDate: {
      sql: `management_end_date`,
      type: `time`,
    },
    propertyAddress: {
      sql: `property_address`,
      type: `string`,
    },
    propertyType: {
      sql: `property_type`,
      type: `string`,
    },
  },

  refresh_key: {
    every: `1 hour`,
    sql: `SELECT MAX(created_at) FROM managements`,
  },

  dataSource: `propertyManagement`,
});
kevinleeTCA commented 1 month ago

Similar issue has been reported in early cube version.

igorlukanin commented 1 month ago

Hi @kevinleeTCA 👋

Let's clarify a few bits about your setup. I see that you use pre-aggregations and, at the same time, I see that you set CUBEJS_CACHE_AND_QUEUE_DRIVER=memory. Why would you not use Cube Store? Are you trying to use something else for pre-aggregation storage?

I suggest that you set up a Cube Store cluster (router node, at least two worker nodes), use it both for cache/queue and pre-aggregations; also, please upgrade to the latest version of Cube. I believe that would resolve the issue.

kevinleeTCA commented 1 week ago

Hi @igorlukanin thank you for your reply.

We upgraded to the latest version (v0.35) a few weeks, a and this problem still happens, see below latest log:
Screenshot 2024-09-03 at 1 32 47 pm Screenshot 2024-09-03 at 1 33 30 pm

Regarding the reason that why we have to use memory for CUBEJS_CACHE_AND_QUEUE_DRIVER, as we are blocked by this, i.e., cubestore does not support timestampz (which are used frequently in postgres db).

@paveltiunov Where should I cast the value? Should I create a new dimension with a value casted to timestamp or just casting the existing time dimension will be fine (assuming it wont' affect timezone)?

igorlukanin commented 1 week ago

@kevinleeTCA How exactly are you blocked? Can you please explain?

kevinleeTCA commented 6 days ago

@igorlukanin I shared the link above, basically we encountered an issue when running pre-aggregation with timestampz (postgres primary type) as time dimension:

ERROR [cubestore::http] <pid:1> Error processing HTTP command: Custom type 'timestamptz' is not supported 

I will try your proposal mentioned here:

    dimensions:
        # It's OK to have a TIMESTAMPZ value as a time dimension, it will work just fine
      - name: created_at
        sql: created_at
        type: time

        # You have to cast it to `TIMESTAMP` if you'd like to use it in a non-`time` dimension
      - name: created_at_raw
        sql: "created_at::TIMESTAMP"
        type: string

Will update it here on how it goes