cube-js / cube

πŸ“Š Cube β€” Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
https://cube.dev
Other
18.01k stars 1.78k forks source link

MIN / MAX Time aggregation function represented as numeric #8153

Open pauldheinrichs opened 7 months ago

pauldheinrichs commented 7 months ago

Describe the bug I've noticed when attempting to utilize a time aggregation in a few different ways i'm getting odd behavrious in each scenario.

Scenario A)

type: min
sql: time

Okay - that's fine let's try something else

Scenario B)

type: time
sql: MIN(time)

Tested:

No difference

Expected behavior

I would anticipate in scenario 1 to be able to query the metric in scenario 2 i would expect the pre-aggregation to apply Screenshots If applicable, add screenshots to help explain your problem. scenario A) image

Scenario B) image image

Minimally reproducible Cube Schema In case your bug report is data modelling related please put your minimally reproducible Cube Schema here. You can use selects without tables in order to achieve that as follows.

cube(`test`, {
  sql: `
    SELECT 1 as userid, '2021-01-01' as signupdate
    UNION ALL
    SELECT 2 as userid, '2021-01-02' as signupdate
    UNION ALL
    SELECT 3 as userid, '2021-01-03' as signupdate
    UNION ALL
    SELECT 4 as userid, '2021-01-04' as signupdate
  `,
  description: `table`,
  // datasource: `default`,

  measures: {
    // does not work with pre-aggregations
    min_time_broken_pre_aggs: {
      sql: `MIN(signupdate)`,
      type: `time`,
    },

    // is a numeric value unqueryable
    min_time_unqueryable: {
      sql: `signupdate`,
      type: `min`,
    },
  },

  dimensions: {
    userid: {
      description: `id generated when user is created`,
      sql: `userid`,
      type: `number`,
      primary_key: true,
    },

    signupdate: {
      description: `date when the user is created`,
      sql: `signupdate`,
      type: `time`,
    },
  },

  preAggregations: {
    test_rollup: {
      type: 'rollup',
      measureReferences: [min_time_broken_pre_aggs],
      dimensionReferences: [userid],
      timeDimensionReference: signupdate,
      granularity: 'day',
      partitionGranularity: 'month',
      refreshKey: {
        every: '1 hour',
        incremental: true,
        updateWindow: '1 hour'
      }
    },

    test_rollup_2: {
      type: 'rollup',
      measureReferences: [min_time_unqueryable],
      dimensionReferences: [userid],
      timeDimensionReference: signupdate,
      granularity: 'day',
      partitionGranularity: 'month',
      refreshKey: {
        every: '1 hour',
        incremental: true,
        updateWindow: '1 hour'
      }
    }
  }
});

Version: [0.35.12]

Additional context I've noticed the same behaviour is represented by max as welll

error logs in scenario 1 where min is represented by numeric

β”‚ 2024-04-16 13:19:05,191 WARN  [cubesql::compile::engine::df::scan] Unable to parse value as f64: invalid float literal                                                                                       β”‚
β”‚ 2024-04-16 13:19:05,191 WARN  [cubesql::compile::engine::df::scan] Unable to parse value as f64: invalid float literal
igorlukanin commented 1 month ago

Hi @pauldheinrichs πŸ‘‹

Thanks for a very elaborate report!

This is actually the intended behavior, meaning this is how it's currently implemented: according to docs, min and similar functions are only intended to work with numeric values.

I see that you started a PR, thanks for that!