JuliaDatabases / JDBC.jl

Julia interface to Java database drivers
Other
38 stars 19 forks source link

Discrepancies in retrieved data of type DATETIME on Oracle and MySQL DBMS #15

Open bmharsha opened 8 years ago

bmharsha commented 8 years ago

JDBC.jl is retrieving incorrect data when it comes to following values, this issue can be reproduced using following statements

julia> executeUpdate(stmt, "create table emp(l DATETIME)")
julia> executeUpdate(stmt, "insert into emp(l) values('2003-10-27 07:55:28')")
julia> executeUpdate(stmt, "insert into emp(l) values('2003-08-21 02:50:29')")   
julia> rs = JDBC.executeQuery(stmt, "select * from emp")
JavaCall.JavaObject{symbol("java.sql.ResultSet")}(Ptr{Void} @0x000000000bd33100)

julia> JDBC_retrieved = readtable(rs)
2x1 DataFrames.DataFrame
| Row | l                   |
|-----|---------------------|
| 1   | 2003-10-27T07:25:28 |
| 2   | 2003-08-21T02:20:29 |

Please observe, the retrieved minutes is incorrect , I inserted 55 and 50 respectively , whereas when retrieved, I got 25 and 20.

Insert seems to have occurred correctly because when I cross checked in MySQL, I did found correct values.

mysql> select * from emp;
+---------------------+
| l                   |
+---------------------+
| 2003-10-27 07:55:28 |
| 2003-08-21 02:50:29 |
+---------------------+       

Configuration

DBMS = MySQL version 5.7.9 (Community Server edition) Julia version 0.4.1 OS = CentOS 7 JDBC.jl is upto date with the version on MetaData.jl

bmharsha commented 8 years ago

This issue is not specific to MySQL, this issue exists even while retrieving data from Oracle, in all cases, minutes is off by exactly 30 minutes i.e in retrieved data, minutes is exactly 30 minutes less than the expected time. Here is an output from Oracle

julia> retrieved[4][6]
2005-11-29T08:45:43

julia> inserted_datetime[6]
"2005-11-29 09:15:43"   
bmharsha commented 8 years ago

I was able to reproduce this issue on following configuration too:

OS = Windows 10, 64 bit DBMS = Oracle Database 11g Express Edition Julia version 0.4.2

aviks commented 8 years ago

What is the timezone of the machine running the queries, and the machine with the server?

bmharsha commented 8 years ago

TimeZone was IST (Indian Standard Time) in all the cases (Both Server and Client were on IST).

aviks commented 8 years ago

Can you try this on the aws server? My guess is that this won't be problem for non half hour time zones. I will fix this later today, but this may not be a problem for non-IST timezone.

aviks commented 8 years ago

Actually, unfortunately I cannot replicate this at all. Even after changing my timezone to IST.

mysql> select * from emp ;
+---------------------+
| l                   |
+---------------------+
| 2003-10-27 07:55:28 |
| 2003-08-21 02:50:29 |
+---------------------+
2 rows in set (0.00 sec)
julia> JDBC_retrieved = readtable(rs)
2x1 DataFrames.DataFrame
| Row | l                   |
|-----|---------------------|
| 1   | 2003-10-27T07:55:28 |
| 2   | 2003-08-21T02:50:29 |
bmharsha commented 8 years ago

Can you try this on the aws server?

Sure, I tried this on 64 bit Windows 2008 R2 server (AWS), DBMS was Oracle 11g Express Edition

On AWS server, I wasn't able to reproduce this error on IST, but I was able to reproduce this error on following timezones (For your convenience, timezones are listed as per Windows naming convention)

NOTE:- This is not an exhaustive list, there might be more timezones that are getting affected by this bug, these (Pacific and Mountain Time) were just the first two I came across on AWS server.

UTC-07:00 Mountain Time (US & Canada)

Please observe there is a one hour difference in the second retrieved value

julia> rs = JDBC.executeQuery(stmt, "SELECT SESSIONTIMEZONE FROM DUAL")  #Shows the timezone W.R.T Oracle DBMS
JavaCall.JavaObject{symbol("java.sql.ResultSet")}(Ptr{Void} @0x0000000011bced88)

julia> JDBC_retrieved = readtable(rs) #Shows the timezone W.R.T Oracle DBMS
1x1 DataFrames.DataFrame
| Row | SESSIONTIMEZONE  |
|-----|------------------|
| 1   | "America/Denver" |

julia> executeUpdate(stmt, "create table emp(l DATE)")
0

julia> executeUpdate(stmt, "insert into emp(l) values (TO_DATE('2005-11-29 09:15:43', 'yyyy-mm-dd hh24:mi:ss'))")
1

julia> executeUpdate(stmt, "insert into emp(l) values (TO_DATE('2003-08-21 02:50:29', 'yyyy-mm-dd hh24:mi:ss'))")
1

julia> rs = JDBC.executeQuery(stmt, "select * from emp")
JavaCall.JavaObject{symbol("java.sql.ResultSet")}(Ptr{Void} @0x0000000011bcedd8)

julia> JDBC_retrieved = readtable(rs)
2x1 DataFrames.DataFrame
| Row | L                   |
|-----|---------------------|
| 1   | 2005-11-29T09:15:43 |
| 2   | 2003-08-21T01:50:29 |

UTC-08:00 Pacific Time (US & Canada)

Please observe there is a one hour difference in the second retrieved value

julia> rs = JDBC.executeQuery(stmt, "SELECT SESSIONTIMEZONE FROM DUAL")  #Shows the timezone W.R.T Oracle DBMS
JavaCall.JavaObject{symbol("java.sql.ResultSet")}(Ptr{Void} @0x0000000009b4ed88)

julia> JDBC_retrieved = readtable(rs) #Shows the timezone W.R.T Oracle DBMS
1x1 DataFrames.DataFrame
| Row | SESSIONTIMEZONE       |
|-----|-----------------------|
| 1   | "America/Los_Angeles" |

julia> executeUpdate(stmt, "create table emp(l DATE)")
0

julia> executeUpdate(stmt, "insert into emp(l) values (TO_DATE('2005-11-29 09:15:43', 'yyyy-mm-dd hh24:mi:ss'))")
1

julia> executeUpdate(stmt, "insert into emp(l) values (TO_DATE('2003-08-21 02:50:29', 'yyyy-mm-dd hh24:mi:ss'))")
1

julia> rs = JDBC.executeQuery(stmt, "select * from emp")
JavaCall.JavaObject{symbol("java.sql.ResultSet")}(Ptr{Void} @0x0000000009b4edd8)

julia> JDBC_retrieved = readtable(rs)
2x1 DataFrames.DataFrame
| Row | L                   |
|-----|---------------------|
| 1   | 2005-11-29T09:15:43 |
| 2   | 2003-08-21T01:50:29 |
bmharsha commented 8 years ago

We found a workaround for this issue over Oracle, if we convert the DateTime column to a String (in our query) and explicitly specify the format for conversion, we do get the expected values (As a String)

julia> rs = JDBC.executeQuery(stmt, " select to_char(l, 'yyyy-mm-dd hh24:mi:ss') from emp")

JavaCall.JavaObject{symbol("java.sql.ResultSet")}(Ptr{Void} @0x00000000280adef8)

julia> JDBC_retrieved = readtable(rs)
6x1 DataFrames.DataFrame
| Row | TO_CHAR_L_YYYY_MM_DDHH24_MI_SS_ |
|-----|---------------------------------|
| 1   | "2005-11-29 09:15:43"           |
| 2   | "2003-08-21 02:50:29"           |
| 3   | "2005-11-29 09:15:43"           |
| 4   | "2003-10-27 07:55:28"           |
| 5   | "2003-08-21 02:50:29"           |
| 6   | "2003-08-21 22:50:29"           |

Whereas, if we don't specify anything while retrieval, we will have discrepancies in the retrieved value

julia> rs = JDBC.executeQuery(stmt, "select * from emp")
JavaCall.JavaObject{symbol("java.sql.ResultSet")}(Ptr{Void} @0x00000000280adea8)

julia> JDBC_retrieved = readtable(rs)
6x1 DataFrames.DataFrame
| Row | L                   |
|-----|---------------------|
| 1   | 2005-11-29T09:15:43 |
| 2   | 2003-08-21T01:50:29 |
| 3   | 2005-11-29T09:15:43 |
| 4   | 2003-10-27T07:55:28 |
| 5   | 2003-08-21T01:50:29 |
| 6   | 2003-08-21T21:50:29 |
aviks commented 8 years ago

Actually, I don't think this is a bug. (The 1/2 hour thing was a bug, but I can't replicate it at all). Alternatively, this is a more complex issue that I'd initially guessed.

This is a consequence of DST and the fact that Java turns all dates into local times. Notice that 2003-10-27 07:55:28 is a winter time, while 2003-08-21T02:50:29 is a summer time (and hence the one hour difference). Because we do not have the DST database, we cannot turn it back into the original string format.

In other words, we adjust for local time based on the current timezone offset, but not the historical timezone offset, which is depenedent on the DST database, which Java has, but Julia doesn't.

ViralBShah commented 8 years ago

What's the current resolution? Are we ok to ship what we have? Do we need to make a note of some of this stuff?

aviks commented 8 years ago

I think we are ok to ship with what we have.