cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.89k stars 1.77k forks source link

Cubestore: Divide by Zero Error on calculated measure #2852

Closed JoshMentzer closed 3 years ago

JoshMentzer commented 3 years ago

Describe the bug Calculated measure with a filter, which did work with external preaggregation using Postgres, now is throwing divide by zero error using Cubestore.

To Reproduce Create two measures, using count type - measureA and measureB. Setup a preaggregation on measureA and B. Create third measure, measureC which is setup like so:

measureC: {
      type: 'number',
      format: 'percent',
      sql: `(1 - CAST(${measureA} as float) / CAST(${measureB} as float)) * 100`,
      filters: [
        {
          sql: `${measureB} > 0`,
        },
      ],
      title: 'Calc percentage',
    },

This results in:

Error: Internal: Arrow error: External error: Arrow error: External error: Arrow error: Divide by zero error
   0: <cubestore::CubeError as core::convert::From<datafusion::error::DataFusionError>>::from
   1: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
   2: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
   3: tokio::runtime::task::harness::Harness<T,S>::poll
   4: tokio::runtime::blocking::pool::Inner::run
   5: std::sys_common::backtrace::__rust_begin_short_backtrace
   6: core::ops::function::FnOnce::call_once{{vtable.shim}}
   7: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/a143517d44cac50b20cbd3a0b579addab40dd399/library/alloc/src/boxed.rs:1546:9
      <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
             at /rustc/a143517d44cac50b20cbd3a0b579addab40dd399/library/alloc/src/boxed.rs:1546:9
      std::sys::unix::thread::Thread::new::thread_start
             at /rustc/a143517d44cac50b20cbd3a0b579addab40dd399/library/std/src/sys/unix/thread.rs:71:17
   8: start_thread
   9: clone

Expected behavior It should execute without error and return results for granularities where it exists or nothing if measureB is <= 0.

Version: 0.27.22

JoshMentzer commented 3 years ago

Feel like this behavior is off, but was able to workaround it for the moment by changing the calculated measure like so:

measureC: {
      type: 'number',
      format: 'percent',
      sql: `(1 - CAST(${measureA} as float) / CAST(CASE ${measureB} WHEN 0 THEN 1 ELSE ${measureB} END as float)) * 100`,
      filters: [
        {
          sql: `${measureB} > 0`,
        },
      ],
      title: 'Calc percentage',
    },

This only works for me as a workaround because in these instances, I know 100% that if measureB is 0, then measureA is 0 as measureA is a subset of measureB.

paveltiunov commented 3 years ago

@JoshMentzer Hey Josh! Thanks for posting this one! As for in many databases division by zero in Cube Store is actually an error. However, it supports NULLIF() function so the measure definition should look like:

measureC: {
  type: 'number',
  format: 'percent',
  sql: `(1 - CAST(${measureA} as float) / CAST(NULLIF(${measureB}, 0) as float)) * 100`,
  title: 'Calc percentage',
},
JoshMentzer commented 3 years ago

@paveltiunov Totally understood - wouldn't expect anything to be able to divide by zero; what I felt was odd was that the filter doesn't seem to exclude the 0's like it does when hitting the databases directly or when it was pre-agg'd in postgres instead of cubestore. Either way, it works now w/ the case statement; I'll try NULLIF again to test out; that was one of my first workaround attempts and I received an error about the function - don't recall the exact error though, so will give it another shot. Thx!

paveltiunov commented 3 years ago

@JoshMentzer Yep. It's a little bit tricky. As Cube Store is columnar storage that processes everything in batches and division is evaluated before CASE statement which is produced by the filter. It means it can't recover from an error even if the value is being filtered out. Workaround to this can be to have a measure filter inside measureB itself but best practice is to always use NULLIF for the division.