Teradata / presto

Teradata Distribution of Presto -- A Distributed SQL Query Engine for Big Data
http://www.teradata.com/presto
Apache License 2.0
94 stars 21 forks source link

Timestamp time zone handling appears broken in ORC format for 0.167-t.0.2 #583

Open Downchuck opened 7 years ago

Downchuck commented 7 years ago

Teradata Presto release appears to be deserializing the timestamp column incorrectly in ORC files with time zone information. (ORC written from Hive 1.2.1)

Note the query returns correctly in standard Presto and in Hive. The table was built via Hive, transforming the first column, a string, into a timestamp column; the local time zone of Hive/Presto is America/Los_Angeles.

Teradata Presto 0.167-t.0.2:
time         |           tm            |                    _col2
---------------------+-------------------------+---------------------------------------------
 01-30-2016-00:11:02 | 2016-01-30 08:11:02.000 | 2016-01-30 07:11:02.000 America/Los_Angeles
 01-30-2016-00:39:28 | 2016-01-30 08:39:28.000 | 2016-01-30 07:39:28.000 America/Los_Angeles

Mini cluster (0.17x)
time         |           tm            |                    _col2
---------------------+-------------------------+---------------------------------------------
 01-30-2016-00:11:02 | 2016-01-30 00:11:02.000 | 2016-01-30 00:11:02.000 America/Los_Angeles
 01-30-2016-00:39:28 | 2016-01-30 00:39:28.000 | 2016-01-30 00:39:28.000 America/Los_Angeles

presto:mesoads> select time,tm, tm at time zone 'America/Los_Angeles' from orc_table 
fiedukow commented 7 years ago

Thanks for the bug report. We will look into that. In a mean while, the workaround may be setting session property as follows: set session legacy_timestamp = true; or setting server property: deprecated.legacy-timestamp = true This will most likely restore original prestodb/presto behaviour for date time types, which has problems of their own though.

Downchuck commented 7 years ago

@fiedukow Confirmed, legacy_timestamp works as a workaround.

zz22394 commented 7 years ago

@fiedukow Is this issue related with TIMESTAMP behaviour does not match sql standard #7122 ?

cawallin commented 7 years ago

As far as I know, it's only tangentially related: this ORC bug exists both for the existing timestamp implementation and the implementation in #7122.

Downchuck commented 7 years ago

Seems like we can't use any timezone but UTC -- as the hive connector needs to be set to UTC and Presto, to avoid ORC read issues for data written in UTC.

zz22394 commented 7 years ago

@cawallin Thanks. After reading #7122 and #7480 and DATE TIME SUPPORT IN PRESTO, I think I had got your point.