cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.93k stars 1.78k forks source link

Presto with Postgres as external pre-aggregate DB creates date column of type `text` #1046

Closed anismiles closed 1 month ago

anismiles commented 4 years ago

Problem

I am using Presto as my data-source, with Postgres as external pre-aggregate DB. Pre-aggregate tables are created with date columns as text instead of timestamp.

Related Cube.js schema

cube(`Orders`, {
  sql: `select * from views.pos_sales_enriched_view`,

  measures: {
    count: {
      type: `count`,
      sql: `trxn_id`,
      title: `- count`,
      filters: [
        { sql: `${Orders}.return_flag = 'N'` }
      ]
    },

    amount: {
      type: `sum`,
      sql: `sale_amount`,
      title: ` - amount`,
      filters: [
        { sql: `${Orders}.return_flag = 'N'` }
      ]
    },

  },

  dimensions: {
    orderDate: {
      sql: `trxn_ts`,
      type: `time`
    },
    productId: {
      type: `string`,
      sql: `product_id`,      
    }
  },

  preAggregations: {
    ordersByDay: {
      type: `rollup`,
      measureReferences: [Orders.amount],
      dimensionReferences: [Orders.productId],
      timeDimensionReference: Orders.orderDate,
      granularity: `day`,
      external: true,
    }
  }
})

JSON Query

{
  "measures": [
    "Orders.amount"
  ],
  "timeDimensions": [
    {
      "dimension": "Orders.orderDate",
      "granularity": "day",
      "dateRange": "Last quarter"
    }
  ],
  "order": {},
  "filters": []
}

Related Cube.js generated SQL

Executing SQL: a4483c90-d1e9-443f-bc08-f597d988c149-span-1 
--
  select max(CAST(date_add('minute', timezone_minute("orders".trxn_ts AT TIME ZONE 'UTC'), date_add('hour', timezone_hour("orders".trxn_ts AT TIME ZONE 'UTC'), "orders".trxn_ts AT TIME ZONE 'UTC')) AS TIMESTAMP)) from views.pos_sales_enriched_view AS "orders"
--
Performing query completed: a4483c90-d1e9-443f-bc08-f597d988c149-span-1 (12462ms)
Performing query: a4483c90-d1e9-443f-bc08-f597d988c149-span-2 
Executing Load Pre Aggregation SQL: a4483c90-d1e9-443f-bc08-f597d988c149-span-2 
--
  SELECT
      "orders".product_id "orders__product_id", date_trunc('day', CAST(date_add('minute', timezone_minute("orders".trxn_ts AT TIME ZONE 'UTC'), date_add('hour', timezone_hour("orders".trxn_ts AT TIME ZONE 'UTC'), "orders".trxn_ts AT TIME ZONE 'UTC')) AS TIMESTAMP)) "orders__order_date_day", sum(CASE WHEN ("orders".return_flag = 'N') THEN "orders".sale_amount END) "orders__amount"
    FROM
      views.pos_sales_enriched_view AS "orders"
  GROUP BY 1, 2
--
Performing query completed: a4483c90-d1e9-443f-bc08-f597d988c149-span-2 (7032ms)
Performing query: 77ec3159-536b-475a-881c-2db4d93c6a66-span-2 
Performing query completed: 77ec3159-536b-475a-881c-2db4d93c6a66-span-2 (7ms)
Performing query: 77ec3159-536b-475a-881c-2db4d93c6a66-span-2 
Executing SQL: 77ec3159-536b-475a-881c-2db4d93c6a66-span-2 
--
  SELECT "orders__order_date_day" "orders__order_date_day", sum("orders__amount") "orders__amount" FROM stb_pre_aggregations.orders_orders_by_day_t2hw54vg_f1dq4lmu_1599053452689  WHERE ("orders__order_date_day" >= ('2020-04-01T00:00:00Z'::timestamptz::timestamptz AT TIME ZONE 'UTC') AND "orders__order_date_day" <= ('2020-06-30T23:59:59Z'::timestamptz::timestamptz AT TIME ZONE 'UTC')) GROUP BY 1 ORDER BY 1 ASC LIMIT 10000
--

Pre-aggregate table definition

postgres=> \d orders_orders_by_day_t2hw54vg_f1dq4lmu_1599053452689;
Table "stb_pre_aggregations.orders_orders_by_day_t2hw54vg_f1dq4lmu_1599053452689"
         Column         |  Type   | Collation | Nullable | Default 
------------------------+---------+-----------+----------+---------
 orders__product_id     | text    |           |          | 
 orders__order_date_day | text    |           |          | 
 orders__amount         | integer |           |          | 

Note - orders__order_date_day is of type text shouldn't it be timestamp ?

Error

Error while querying: 77ec3159-536b-475a-881c-2db4d93c6a66-span-2 (5ms)
{
  "processingId": 1,
  "queueSize": 1,
  "queryKey": [
    "SELECT \"orders__order_date_day\" \"orders__order_date_day\", sum(\"orders__amount\") \"orders__amount\" FROM stb_pre_aggregations.orders_orders_by_day  WHERE (\"orders__order_date_day\" >= ($1::timestamptz::timestamptz AT TIME ZONE 'UTC') AND \"orders__order_date_day\" <= ($2::timestamptz::timestamptz AT TIME ZONE 'UTC')) GROUP BY 1 ORDER BY 1 ASC LIMIT 10000",
    [
      "2020-04-01T00:00:00Z",
      "2020-06-30T23:59:59Z"
    ],
    [
      [
        "CREATE TABLE stb_pre_aggregations.orders_orders_by_day AS SELECT\n      \"orders\".product_id \"orders__product_id\", date_trunc('day', CAST(date_add('minute', timezone_minute(\"orders\".trxn_ts AT TIME ZONE 'UTC'), date_add('hour', timezone_hour(\"orders\".trxn_ts AT TIME ZONE 'UTC'), \"orders\".trxn_ts AT TIME ZONE 'UTC')) AS TIMESTAMP)) \"orders__order_date_day\", sum(CASE WHEN (\"orders\".return_flag = 'N') THEN \"orders\".sale_amount END) \"orders__amount\"\n    FROM\n      views.pos_sales_enriched_view AS \"orders\"\n  GROUP BY 1, 2",
        []
      ]
    ]
  ],
  "queuePrefix": "SQL_QUERY_EXT_STANDALONE_default",
  "timeInQueue": 0
} 
error: operator does not exist: text >= timestamp without time zone
    at Parser.parseErrorMessage (/Users/animesh/Development/cube-dataos-presto-exp/first-iteration/node_modules/pg-protocol/dist/parser.js:278:15)
    at Parser.handlePacket (/Users/animesh/Development/cube-dataos-presto-exp/first-iteration/node_modules/pg-protocol/dist/parser.js:126:29)
    at Parser.parse (/Users/animesh/Development/cube-dataos-presto-exp/first-iteration/node_modules/pg-protocol/dist/parser.js:39:38)
    at Socket.<anonymous> (/Users/animesh/Development/cube-dataos-presto-exp/first-iteration/node_modules/pg-protocol/dist/index.js:8:42)
    at Socket.emit (events.js:203:13)
    at addChunk (_stream_readable.js:294:12)
    at readableAddChunk (_stream_readable.js:275:11)
    at Socket.Readable.push (_stream_readable.js:210:10)
    at TCP.onStreamRead (internal/stream_base_commons.js:166:17)

Observation Please note that when I manually ALTER column orders_orders_by_day_t2hw54vg_f1dq4lmu_1599053452689.orders__order_date_day for type timestamp my query works.

Below are ALTER commands -

ALTER TABLE orders_orders_by_day_t2hw54vg_f1dq4lmu_1599053452689 
ADD COLUMN create_time_holder TIMESTAMP without time zone NULL;

UPDATE orders_orders_by_day_t2hw54vg_f1dq4lmu_1599053452689 
SET create_time_holder = orders__order_date_day::TIMESTAMP;

ALTER TABLE orders_orders_by_day_t2hw54vg_f1dq4lmu_1599053452689 
ALTER COLUMN orders__order_date_day TYPE TIMESTAMP without time zone USING create_time_holder;

ALTER TABLE orders_orders_by_day_t2hw54vg_f1dq4lmu_1599053452689 
DROP COLUMN create_time_holder;
jcw- commented 4 years ago

Thanks for the detailed issue submission! I can see that you are using partitioning, which reminds me of when a very similar issue was fixed for the mssql driver. I suspect the presto driver will need a similar treatment: https://github.com/cube-js/cube.js/pull/920

Edit: Oh hold on that's not using partioning, nevermind. Looks like you found the issue though, nice!

anismiles commented 4 years ago

Hey @jcw- I found a fix, but not sure this is the right way. Basically, Presto returns timestamp data in this format - 2020-06-01 00:00:00.000, and BaseDriver doesn't have a regex pattern to match it for timestamp.

I updated the condition with an additional pattern -

  timestamp: (v) => v instanceof Date || v.toString().match(/^\d\d\d\d-\d\d-\d\dT\d\d:\d\d:\d\d/) || v.toString().match(/^\d\d\d\d-\d\d-\d\d \d\d:\d\d:\d\d.\d\d\d/),

And added readOnly function in PrestoDriver.js

  readOnly() {
    return !!this.config.readOnly;
  }

It seems to work for me! Please let me know if this is the right fix, I would be happy to do a PR.

@paveltiunov (apologies to tag you, but I think you should take a look)

jcw- commented 4 years ago

I think you're on the right track, a PR seems like a good next step to me. BTW if you make that readOnly change in the base driver, you'll also fix https://github.com/cube-js/cube.js/issues/1028 :)

anismiles commented 4 years ago

@jcw- Looks like implementation in BaseDriver covers it. I have tested it, and it works okay. I only needed to add additional regex to detect date pattern. Am I missing something here? I probably should remove

  readOnly() {
    return !!this.config.readOnly;
  }

from PrestoDriver Right?

jcw- commented 4 years ago

@anismiles right, move it to base driver :)

hassankhan commented 3 years ago

Hi @anismiles :wave:

Just wanted to follow up; we'd be delighted to review and merge a PR fixing this, if you can create one 😀

paveltiunov commented 1 month ago

Closing as storing pre-aggregations in Cube Store the only supported flow here.