cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.73k stars 1.75k forks source link

Low sql query performance in cube.js #8588

Open alexeybahterwork opened 4 weeks ago

alexeybahterwork commented 4 weeks ago

Problem

Hello. Please tell me, I use an sql expression in a cube and the query to this cube takes a very long time. What could be the reason? How can I optimize the query to the database in cube.js?

I tried changing the limit to a lower value, but it didn't help.

Perhaps the sql query is too complex, so cube processes it slowly

Related Cube.js schema

cube(`statement`, {
  sql: `select
    s.id,
    s.balance_in ,
    s.balance_out ,
    s.balance_in +sl.amount_ct - sl.amount_dt as balance_end,
    sl.amount_dt,
    sl.amount_ct,
    s.operation_dt ,
    s.account_id,
    s.lines
  from test.statement s
  inner join test.account a on a.id=s.account_id
  left join (
    select
      collection_id,
      sum(case when ap then amount else 0 end) as amount_ct,
      sum(case when ap then 0 else amount end) as amount_dt
    from test.statement_line sl
    where true
      and coalesce(amount,0)<>0
      and sys_state='END'
    group by 1) sl on s.lines=sl.collection_id
  where true
    and coalesce(final_flag,0) = 0
    and sys_state='END'
    and delta_flag=false
    and ext_ref!='0'
    and business_id IN(${COMPILE_CONTEXT.businessID})
  order by account_id, operation_dt`,

  title: `statement`,

  data_source: `default`,

  joins: {
    account: {
      sql: `${CUBE}.account_id = ${account}.id`,
      relationship: `many_to_one`,
    },
  },

  dimensions: {
    id: {
      sql: `id`,
      type: `number`,
      primary_key: true,
      title: `ID`,
      shown: false,
    },

    balance_in: {
      sql: `balance_in`,
      type: `number`,
      title: `balance_in`,
    },
  },

  measures: {}
})

Related Cube.js generated SQL

SELECT
  "statement".balance_in "statement__balance_in"
FROM
  (
    select
      s.id,
      s.balance_in,
      s.balance_out,
      s.balance_in + sl.amount_ct - sl.amount_dt as balance_end,
      sl.amount_dt,
      sl.amount_ct,
      s.operation_dt,
      s.account_id,
      s.lines
    from
      test.statement s
      inner join test.account a on a.id = s.account_id
      left join (
        select
          collection_id,
          sum(
            case
              when ap then amount
              else 0
            end
          ) as amount_ct,
          sum(
            case
              when ap then 0
              else amount
            end
          ) as amount_dt
        from
          test.statement_line sl
        where
          true
          and coalesce(amount, 0) <> 0
          and sys_state = 'END'
        group by
          1
      ) sl on s.lines = sl.collection_id
    where
      true
      and coalesce(final_flag, 0) = 0
      and sys_state = 'END'
      and delta_flag = false
      and ext_ref != '0'
      and business_id IN(173375800, 1006000, 1006000)
    order by
      account_id,
      operation_dt
  ) AS "statement"
GROUP BY
  1
ORDER BY
  1 ASC
LIMIT
  1000
ifokeev commented 4 weeks ago

Hi! It could not be a query performance but your database bottleneck. Cube.js adds a very little overhead during the query, most performance issues are somewhere in between. For example, some users experience data models recompiling on every request, others have issues with refreshing queries, so you need to find where yours is.

igorlukanin commented 3 weeks ago

@alexeybahterwork I would recommend to run the query from statement's sql directly in your database and see how well it performs. Chances are that it's not performing very well on its own. Also, running EXPLAIN and EXPLAIN ANALYZE might provide more insights.

Which database do you use? How large are the datasets? When you say "takes a very long time", how long is that, exactly?

My general recommendations would be to...