apache / superset

Apache Superset is a Data Visualization and Data Exploration Platform
https://superset.apache.org/
Apache License 2.0
62.71k stars 13.84k forks source link

Problem with dates in Oracle #6572

Closed ghsalem closed 5 years ago

ghsalem commented 5 years ago

Make sure these boxes are checked before submitting your issue - thank you!

Superset version

0.28.1

Expected results

When using dates in Oracle, the dbengine transforms columns into dates using the to_date function. Not sure why. I modified the db_engine file to use cast instead, and it works.

Actual results

error ORA-01830: date format picture ends before converting entire input string

Steps to reproduce

use any oracle table with a timestamp column defined as temporal. Here's the log output from superset 2018-12-27 09:17:08,120:INFO:root:SELECT "ip_orig", "timestamp", "count" FROM (SELECT ip_orig AS "ip_orig", TRUNC(TO_DATE(date_deb), 'HH') AS "timestamp", COUNT() AS "count" FROM psa_log_t WHERE date_deb >= TO_TIMESTAMP('2018-09-27T00:00:00', 'YYYY-MM-DD"T"HH24:MI:SS.ff6') AND date_deb <= TO_TIMESTAMP('2018-12-27T00:00:00', 'YYYY-MM-DD"T"HH24:MI:SS.ff6') GROUP BY ip_orig, TRUNC(TO_DATE(date_deb), 'HH') ORDER BY COUNT() DESC) WHERE ROWNUM <= 10000 2018-12-27 09:17:08,175:INFO:root:Database.get_sqla_engine(). Masked URL: oracle+cx_oracle://psalog:XXXXXXXXXX@corvlog_high 2018-12-27 09:17:08,392:ERROR:root:ORA-12801: error signaled in parallel query server P003, instance 7

ghsalem commented 5 years ago

Here's the modified part of db_engine_specs.py:

time_grain_functions = {

    #None: '{col}',
    #'PT1S': 'CAST({col} as DATE)',
    #'PT1M': "TRUNC(TO_DATE({col}), 'MI')",
    #'PT1H': "TRUNC(TO_DATE({col}), 'HH')",
    #'P1D': "TRUNC(TO_DATE({col}), 'DDD')",
    #'P1W': "TRUNC(TO_DATE({col}), 'WW')",
    #'P1M': "TRUNC(TO_DATE({col}), 'MONTH')",
    #'P0.25Y': "TRUNC(TO_DATE({col}), 'Q')",
    #'P1Y': "TRUNC(TO_DATE({col}), 'YEAR')",
#}
time_grain_functions = {
    None: '{col}',
    'PT1S': 'CAST({col} as DATE)',
    'PT1M': "TRUNC(cast({col} as date), 'MI')",
    'PT1H': "TRUNC(cast({col} as date), 'HH')",
    'P1D': "TRUNC(cast({col} as date), 'DDD')",
    'P1W': "TRUNC(cast({col} as date), 'WW')",
    'P1M': "TRUNC(cast({col} as date), 'MONTH')",
    'P0.25Y': "TRUNC(cast({col} as date), 'Q')",
    'P1Y': "TRUNC(cast({col} as date), 'YEAR')",
}

The commented part is the "current" code. The non-commented is my modification. It makes it work in my case.

villebro commented 5 years ago

Good catch @ghsalem, I can confirm that your version works both for timestamp and date types, while the other only works for date. Mind putting through a PR? (nit: PT1S has CAST and DATE in UPPERCASE while the rest are lowercase; perhaps going all UPPERCASE would be more Oracleish?)

ghsalem commented 5 years ago

@villebro, in fact, the TRUNC function in Oracle returns a DATE type, and it works for TIMESTAMP, doing an implicit CAST. It will also work for columns defined as VARCHAR2 (or CHAR) and that hold dates in string format. The catch for this later case is that the string should correspond to the NLS_DATE_FORMAT of the session (i.e. superset connection). I'm new to superset, and have no idea how this can be done (easy from any oracle client: alter session set nls_date_format='.....'). As for the PR, Can you do it please? I'm newer to github than to superset, so it would help a lot if you can. As for the UPPERCASE, that's a good idea, not because it is more Oracleish (it is not), but because it makes for more readable code.

Thanks

villebro commented 5 years ago

@ghsalem Not too familiar with Oracle syntax, but if I understood you correctly we can omit the CAST entirely, resulting in e.g. 'PT1M': "TRUNC({col}, 'MI')",? Tried it out, and seems to work fine, too. Still wondering if there might have been some reason for the TO_DATE() being there..

Btw, if you want to locally override the built-in timegrains in db_engine_spec.py, you can just add the following to superset_config.py:

TIME_GRAIN_ADDON_FUNCTIONS = {
    'oracle': {
        None: '{col}',
        'PT1S': 'CAST({col} as DATE)',
        'PT1M': "TRUNC({col}, 'MI')",
        'PT1H': "TRUNC({col}, 'HH')",
        'P1D': "TRUNC({col}, 'DDD')",
        'P1W': "TRUNC({col}, 'WW')",
        'P1M': "TRUNC({col}, 'MONTH')",
        'P0.25Y': "TRUNC({col}, 'Q')",
        'P1Y': "TRUNC({col}, 'YEAR')",
    },
}
ghsalem commented 5 years ago

@villebro As you verified, removing the CAST altogether works, but only if COL is of type DATE or TIMESTAMP. Keeping CAST will cater for cases where COL is in fact a string representing dates in some format. As I said in my previous message, the format has to be set at the session level. Thanks for the hint about superset_config.py.

villebro commented 5 years ago

@ghsalem Right you are, retaining CAST inside TRUNC does in fact support even string dates, so your original proposal seems to support DATE, TIMESTAMP and CHAR, very nice. I did some digging and wasn't able to find any real reason for going with the nested TO_DATE expression, so I'm guessing the original Oracle time grain PR #1544 didn't consider the possibility of using TIMESTAMP.

Are you sure you don't feel up to the task of putting through a PR? This page seems to have a fairly good tutorial to get you going: https://www.javahelps.com/2017/04/git-pull-request-complete-guide-for.html

ghsalem commented 5 years ago

@villebro , thanks for the pointer. Just finished the thing.