Quansight / omnisci

Explorations on using MapD and Jupyter together.
4 stars 1 forks source link

[IBIS] FEAT: Need to add Ibis support for various timestamp component operations in OmniSci #56

Open xmnlab opened 5 years ago

xmnlab commented 5 years ago

issue: https://github.com/ibis-project/ibis/issues/1916

xmnlab commented 4 years ago

WIP PR: https://github.com/ibis-project/ibis/pull/2149

xmnlab commented 4 years ago

EPOCH Extraction

it seems OmniSciDB and PostgreSQL Epoch function doesn't have the same result:

================================== omniscidb ===================================
SELECT EXTRACT(EPOCH FROM "timestamp_of_birth") AS tmp
FROM salary
................................................................................
result: [  868410120  1634067208 -1895243000 ... -1189941608  -424284520
 -1515721064]
=================================== postgres ===================================
SELECT CAST(EXTRACT(epoch FROM t0.timestamp_of_birth) AS INTEGER) AS tmp 
FROM salary AS t0
................................................................................
result: [1262307661 1293843661 1325379661 ... 1546258332 1577794332 1609416732]

but it seems OmniSciDB DATEEPOCH has a similar result (I am assuming that it truncate the datetime by the date part) :

================================== omniscidb ===================================
SELECT EXTRACT(DATEEPOCH FROM "timestamp_of_birth") AS tmp
FROM salary
................................................................................
result: [1262304000 1293840000 1325376000 ... 1546214400 1577750400 1609372800]
xmnlab commented 4 years ago

WEEK Extraction

It seems OmniSciDB Extraction by WEEK is not a ISOWEEK, as used by PostgreSQL and Pandas.

I have the same result for a ISO WEEK using the follow expression:

================================== omniscidb ===================================
SELECT
  CASE
    WHEN CAST(floor(((EXTRACT(DOY FROM "timestamp_of_birth") - (EXTRACT(ISODOW FROM "timestamp_of_birth") - 1 + 1)) + 10) / 7) AS SMALLINT) < 1 THEN CASE WHEN ((mod((((EXTRACT(YEAR FROM "timestamp_of_birth") - 1) + floor((EXTRACT(YEAR FROM "timestamp_of_birth") - 1) / 4)) - floor((EXTRACT(YEAR FROM "timestamp_of_birth") - 1) / 100)) + floor((EXTRACT(YEAR FROM "timestamp_of_birth") - 1) / 400), 7)) = 4) OR ((mod(((((EXTRACT(YEAR FROM "timestamp_of_birth") - 1) - 1) + floor(((EXTRACT(YEAR FROM "timestamp_of_birth") - 1) - 1) / 4)) - floor(((EXTRACT(YEAR FROM "timestamp_of_birth") - 1) - 1) / 100)) + floor(((EXTRACT(YEAR FROM "timestamp_of_birth") - 1) - 1) / 400), 7)) = 3) THEN 1 ELSE 0 END + 52
    WHEN CAST(floor(((EXTRACT(DOY FROM "timestamp_of_birth") - (EXTRACT(ISODOW FROM "timestamp_of_birth") - 1 + 1)) + 10) / 7) AS SMALLINT) > (CASE WHEN ((mod(((EXTRACT(YEAR FROM "timestamp_of_birth") + floor(EXTRACT(YEAR FROM "timestamp_of_birth") / 4)) - floor(EXTRACT(YEAR FROM "timestamp_of_birth") / 100)) + floor(EXTRACT(YEAR FROM "timestamp_of_birth") / 400), 7)) = 4) OR ((mod((((EXTRACT(YEAR FROM "timestamp_of_birth") - 1) + floor((EXTRACT(YEAR FROM "timestamp_of_birth") - 1) / 4)) - floor((EXTRACT(YEAR FROM "timestamp_of_birth") - 1) / 100)) + floor((EXTRACT(YEAR FROM "timestamp_of_birth") - 1) / 400), 7)) = 3) THEN 1 ELSE 0 END + 52) THEN 1
    ELSE CAST(floor(((EXTRACT(DOY FROM "timestamp_of_birth") - (EXTRACT(ISODOW FROM "timestamp_of_birth") - 1 + 1)) + 10) / 7) AS SMALLINT)
  END AS tmp
FROM salary
................................................................................
result: [53 52 52 ...  1  1 53]
expected: [53 52 52 ...  1  1 53]
=================================== postgres ===================================
SELECT CAST(EXTRACT(week FROM t0.timestamp_of_birth) AS SMALLINT) AS tmp 
FROM salary AS t0
................................................................................
result: [53 52 52 ...  1  1 53]
expected: [53 52 52 ...  1  1 53]

using WEEK on omniscidb the result is different:

sql = '''
SELECT EXTRACT(WEEK FROM "timestamp_of_birth") AS tmp
FROM salary
'''
cursor2df(con['omniscidb'].con.execute(sql)).tmp.values

array([ 1, 1, 1, ..., 53, 53, 53])

another example: https://github.com/Quansight/ivan-sandbox/blob/master/ibis/poc/omniscidb-woy-pandas-comparison.ipynb

xmnlab commented 4 years ago

NANOSECOND Extraction

It seems, currently, pymapd use datetime object to return a date time data. So as datetime doesn't store nanosecond information, the result from pymapd lost the nanosecond information.

Maybe it could use np.datetime64 for that.

kcpevey commented 4 years ago

@xmnlab The PR referenced above ibis-project/ibis#2149 is closed. Can you provide an update with links to the relevant PRs or upstream issues? Thanks!

xmnlab commented 4 years ago

this issue is related to some ibis PRs:

There is also a pymapd PR related to this issue:

As OmniSciDB week of year extraction doesn't perform an ISO week extraction, its ibis implementation is blocked.

I think for closing this issue we need also to add support for nanoseconds on ibis omniscidb. And maybe if there are more operations to add support we can open a new issue.

kcpevey commented 4 years ago

For this issue, the last thing to be implemented is nanoseconds. Ibis uses pymapd==0.23, this needs pymapd==0.24.0, but this is breaking the CI due to packaging issues. This is blocked until https://github.com/ibis-project/ibis/pull/2256 is merged. Also, 0.24 changes the sql operation on omniscid.

xmnlab commented 4 years ago

it is still blocked until new pymapd is used by ibis. it depends on https://github.com/ibis-project/ibis/pull/2256 (but the PR is closed)