BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.94k stars 184 forks source link

[BUG] All values = 1970-01-01 when CAST to TIMESTAMP #269

Open gumdropsteve opened 4 years ago

gumdropsteve commented 4 years ago

Describe the bug All values CAST to TIMESTAMP return as 1 January 1970. image

Steps/Code to reproduce bug Here's a Jupyter Notebook that reproduces the issue: https://gist.github.com/gumdropsteve/48af7ec54c9e7cf52ded504e2f2a2dfb

Expected behavior Values to return accurately.

Environment overview (please complete the following information)

Environment details

Click here to see environment details

     **git***
     Not inside a git repository

     ***OS Information***
     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=18.04
     DISTRIB_CODENAME=bionic
     DISTRIB_DESCRIPTION="Ubuntu 18.04.3 LTS"
     NAME="Ubuntu"
     VERSION="18.04.3 LTS (Bionic Beaver)"
     ID=ubuntu
     ID_LIKE=debian
     PRETTY_NAME="Ubuntu 18.04.3 LTS"
     VERSION_ID="18.04"
     HOME_URL="https://www.ubuntu.com/"
     SUPPORT_URL="https://help.ubuntu.com/"
     BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
     PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
     VERSION_CODENAME=bionic
     UBUNTU_CODENAME=bionic
     Linux ip-172-31-61-190 4.15.0-1057-aws #59-Ubuntu SMP Wed Dec 4 10:02:00 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
     Wed Jan 22 02:48:24 2020
     +-----------------------------------------------------------------------------+
     | NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
     |-------------------------------+----------------------+----------------------+
     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     |===============================+======================+======================|
     |   0  Tesla T4            On   | 00000000:00:1E.0 Off |                    0 |
     | N/A   47C    P0    27W /  70W |   7901MiB / 15079MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+

     +-----------------------------------------------------------------------------+
     | Processes:                                                       GPU Memory |
     |  GPU       PID   Type   Process name                             Usage      |
     |=============================================================================|
     |    0      7865      C   /opt/tljh/user/bin/python                    339MiB |
     +-----------------------------------------------------------------------------+

     ***CPU***
     Architecture:        x86_64
     CPU op-mode(s):      32-bit, 64-bit
     Byte Order:          Little Endian
     CPU(s):              4
     On-line CPU(s) list: 0-3
     Thread(s) per core:  2
     Core(s) per socket:  2
     Socket(s):           1
     NUMA node(s):        1
     Vendor ID:           GenuineIntel
     CPU family:          6
     Model:               85
     Model name:          Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz
     Stepping:            7
     CPU MHz:             3203.523
     BogoMIPS:            4999.99
     Hypervisor vendor:   KVM
     Virtualization type: full
     L1d cache:           32K
     L1i cache:           32K
     L2 cache:            1024K
     L3 cache:            36608K
     NUMA node0 CPU(s):   0-3
     Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single pti fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx avx512f avx512dq rdseed adx smap clflushopt clwb avx512cd avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves ida arat pku ospke avx512_vnni

     ***CMake***
     /usr/local/bin/cmake
Traceback (most recent call last):
  File "/usr/local/bin/cmake", line 6, in 
    from cmake import cmake
ModuleNotFoundError: No module named 'cmake'

     ***g++***
     /usr/bin/g++
     g++ (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
     Copyright (C) 2017 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

     ***nvcc***

     ***Python***
     /opt/tljh/user/bin/python
     Python 3.6.7

     ***Environment Variables***
     PATH                            : /opt/tljh/user/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
     LD_LIBRARY_PATH                 :
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    : /opt/tljh/user/
     PYTHON_PATH                     :

     ***conda packages***
     /opt/tljh/user/bin/conda
     # packages in environment at /opt/tljh/user/:
     #
     # Name                    Version                   Build  Channel
     _libgcc_mutex             0.1                        main
     alembic                   1.2.0                    pypi_0    pypi
     arrow-cpp                 0.15.0           py36h090bef1_2    conda-forge
     asn1crypto                1.3.0                    py36_0
     async-generator           1.10                     pypi_0    pypi
     attrs                     19.1.0                   pypi_0    pypi
     backcall                  0.1.0                      py_0    conda-forge
     blazingsql                0.12.0a         cuda10.0_py36_0    blazingsql-nightly/label/cuda10.0
     bleach                    3.1.0                      py_0    conda-forge
     bokeh                     1.4.0                    py36_0    conda-forge
     boost-cpp                 1.70.0               h8e57a91_2    conda-forge
     brotli                    1.0.7             he1b5a44_1000    conda-forge
     bsql-rapids-thirdparty    0.12.0a                       0    blazingsql-nightly
     bsql-toolchain            0.12.0a                       0    blazingsql-nightly
     bsql-toolchain-aws-cpp    0.12.0a                       0    blazingsql-nightly
     bsql-toolchain-gcp-cpp    0.12.0a                       0    blazingsql-nightly
     bzip2                     1.0.8                h516909a_2    conda-forge
     c-ares                    1.15.0            h516909a_1001    conda-forge
     ca-certificates           2019.11.28           hecc5488_0    conda-forge
     certifi                   2019.11.28               py36_0    conda-forge
     certipy                   0.1.3                    pypi_0    pypi
     cffi                      1.13.2           py36h2e261b9_0
     chardet                   3.0.4                 py36_1003
     click                     7.0                        py_0    conda-forge
     cloudpickle               1.2.2                      py_1    conda-forge
     colorcet                  2.0.2                      py_0    anaconda
     conda                     4.8.1                    py36_0    anaconda
     conda-package-handling    1.6.0            py36h7b6447c_0
     cppzmq                    4.4.1                hc9558a2_0    conda-forge
     cryptography              2.8              py36h1ba5d50_0
     cudatoolkit               10.0.130                      0    nvidia
     cudf                      0.12.0b200115         py36_1311    rapidsai-nightly
     cudnn                     7.6.5                cuda10.0_0
     cuml                      0.12.0a200117   cuda10.0_py36_709    rapidsai-nightly
     cupy                      7.0.0            py36he57b8b9_2    conda-forge
     curl                      7.65.3               hf8cf82a_0    conda-forge
     cycler                    0.10.0                     py_2    conda-forge
     cyrus-sasl                2.1.27               he38ecfd_0    conda-forge
     cytoolz                   0.10.1           py36h516909a_0    conda-forge
     dask                      2.9.1                      py_0    conda-forge
     dask-core                 2.9.1                      py_0    conda-forge
     dask-cuda                 0.12.0a200115           py36_47    rapidsai-nightly
     dask-cudf                 0.12.0b200115         py36_1311    rapidsai-nightly
     datashader                0.9.0                      py_0    conda-forge
     datashape                 0.5.4                      py_1    conda-forge
     decorator                 4.4.0                    pypi_0    pypi
     defusedxml                0.6.0                      py_0    conda-forge
     distributed               2.9.1                      py_0    conda-forge
     dlpack                    0.2                  he1b5a44_1    conda-forge
     double-conversion         3.1.5                he1b5a44_2    conda-forge
     entrypoints               0.3                   py36_1000    conda-forge
     fastavro                  0.22.9           py36h516909a_0    conda-forge
     fastrlock                 0.4             py36he1b5a44_1000    conda-forge
     freetype                  2.10.0               he983fc9_1    conda-forge
     fsspec                    0.6.2                      py_0    conda-forge
     future                    0.18.2                   py36_0    conda-forge
     gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
     gflags                    2.2.2             he1b5a44_1002    conda-forge
     glog                      0.4.0                he1b5a44_1    conda-forge
     grpc-cpp                  1.23.0               h18db393_0    conda-forge
     heapdict                  1.0.1                      py_0    conda-forge
     holoviews                 1.12.7                     py_0    conda-forge
     icu                       64.2                 he1b5a44_1    conda-forge
     idna                      2.8                      py36_0
     imageio                   2.6.1                    py36_0    conda-forge
     importlib_metadata        1.4.0                    py36_0    conda-forge
     ipykernel                 5.1.2                    pypi_0    pypi
     ipython                   7.8.0                    pypi_0    pypi
     ipython-genutils          0.2.0                    pypi_0    pypi
     ipython_genutils          0.2.0                      py_1    conda-forge
     ipywidgets                7.4.2                    pypi_0    pypi
     jedi                      0.15.1                   pypi_0    pypi
     jinja2                    2.10.1                   pypi_0    pypi
     joblib                    0.14.1                     py_0
     jpeg                      9c                h14c3975_1001    conda-forge
     jpype1                    0.7              py36h9de70de_0    conda-forge
     jsonschema                3.0.2                    pypi_0    pypi
     jupyter-client            5.3.3                    pypi_0    pypi
     jupyter-core              4.5.0                    pypi_0    pypi
     jupyter_client            5.3.4                    py36_1    conda-forge
     jupyter_core              4.6.1                    py36_0    conda-forge
     jupyterhub                1.0.0                    pypi_0    pypi
     jupyterlab                0.35.4                   pypi_0    pypi
     jupyterlab-server         0.2.0                    pypi_0    pypi
     kiwisolver                1.1.0            py36hc9558a2_0    conda-forge
     krb5                      1.16.3            h05b26f9_1001    conda-forge
     libblas                   3.8.0               14_openblas    conda-forge
     libcblas                  3.8.0               14_openblas    conda-forge
     libcudf                   0.12.0b200115     cuda10.0_1311    rapidsai-nightly
     libcuml                   0.12.0a200117      cuda10.0_709    rapidsai-nightly
     libcumlprims              0.12.0a200117       cuda10.0_23    rapidsai-nightly
     libcurl                   7.65.3               hda55be3_0    conda-forge
     libedit                   3.1.20181209         hc058e9b_0
     libevent                  2.1.10               h72c5cf5_0    conda-forge
     libffi                    3.2.1                hd88cf55_4
     libgcc-ng                 9.1.0                hdf63c60_0
     libgcrypt                 1.8.4             hf484d3e_1000    conda-forge
     libgfortran-ng            7.3.0                hdf63c60_0
     libgpg-error              1.36                 he1b5a44_0    conda-forge
     libgsasl                  1.8.0             h19a2143_1004    conda-forge
     libhdfs3                  2.3               h311b756_1006    conda-forge
     libhwloc                  2.1.0                h3c4fd83_0    conda-forge
     libiconv                  1.15              h516909a_1005    conda-forge
     liblapack                 3.8.0               14_openblas    conda-forge
     libllvm8                  8.0.1                hc9558a2_0    conda-forge
     libntlm                   1.4               h14c3975_1002    conda-forge
     libnvstrings              0.12.0b200115     cuda10.0_1311    rapidsai-nightly
     libopenblas               0.3.7                h6e990d7_3    conda-forge
     libpng                    1.6.37               hed695b0_0    conda-forge
     libprotobuf               3.8.0                h8b12597_0    conda-forge
     librmm                    0.12.0a200114      cuda10.0_157    rapidsai-nightly
     libsodium                 1.0.17               h516909a_0    conda-forge
     libssh2                   1.8.2                h22169c7_2    conda-forge
     libstdcxx-ng              9.1.0                hdf63c60_0
     libtiff                   4.1.0                hfc65ed5_0    conda-forge
     libuuid                   2.32.1            h14c3975_1000    conda-forge
     libxml2                   2.9.10               hee79883_0    conda-forge
     llvmlite                  0.31.0           py36h8b12597_0    conda-forge
     locket                    0.2.0                      py_2    conda-forge
     lz4-c                     1.8.3             he1b5a44_1001    conda-forge
     mako                      1.1.0                    pypi_0    pypi
     markupsafe                1.1.1            py36h516909a_0    conda-forge
     matplotlib-base           3.1.2            py36h250f245_1    conda-forge
     mistune                   0.8.4           py36h516909a_1000    conda-forge
     more-itertools            8.1.0                      py_0    conda-forge
     msgpack-python            0.6.2            py36hc9558a2_0    conda-forge
     multipledispatch          0.6.0                      py_0    conda-forge
     nbconvert                 5.6.0                    pypi_0    pypi
     nbformat                  4.4.0                    pypi_0    pypi
     nbgitpuller               0.6.1                    pypi_0    pypi
     nbresuse                  0.3.0                    pypi_0    pypi
     nccl                      2.5.6.1              hd6f8bf8_0    conda-forge
     ncurses                   6.1                  he6710b0_1
     netifaces                 0.10.9          py36h516909a_1000    conda-forge
     networkx                  2.4                        py_0    conda-forge
     notebook                  5.7.8                    pypi_0    pypi
     nteract-on-jupyter        2.0.7                    pypi_0    pypi
     numba                     0.47.0           py36hb3f55d8_0    conda-forge
     numpy                     1.17.3           py36h95a1406_0    conda-forge
     nvstrings                 0.12.0b200115         py36_1311    rapidsai-nightly
     oauthlib                  3.1.0                    pypi_0    pypi
     olefile                   0.46                       py_0    conda-forge
     openjdk                   8.0.192           h14c3975_1003    conda-forge
     openssl                   1.1.1d               h516909a_0    conda-forge
     packaging                 20.0                       py_0    conda-forge
     pamela                    1.0.0                    pypi_0    pypi
     pandas                    0.24.2           py36hb3f55d8_1    conda-forge
     pandoc                    2.9.1.1                       0    conda-forge
     pandocfilters             1.4.2                    pypi_0    pypi
     param                     1.9.2                      py_0    conda-forge
     parquet-cpp               1.5.1                         2    conda-forge
     parso                     0.5.1                    pypi_0    pypi
     partd                     1.1.0                      py_0    conda-forge
     pexpect                   4.7.0                    py36_0    conda-forge
     pickleshare               0.7.5                    pypi_0    pypi
     pillow                    6.2.1            py36hd70f55b_1    conda-forge
     pip                       19.3.1                   py36_0
     prometheus_client         0.7.1                      py_0    conda-forge
     prompt-toolkit            2.0.9                    pypi_0    pypi
     prompt_toolkit            3.0.2                      py_0    conda-forge
     psutil                    5.6.3                    pypi_0    pypi
     ptyprocess                0.6.0                   py_1001    conda-forge
     pyarrow                   0.15.0           py36h8b68381_1    conda-forge
     pycosat                   0.6.3            py36h7b6447c_0
     pycparser                 2.19                     py36_0
     pyct                      0.4.6                    py36_0    anaconda
     pygments                  2.4.2                    pypi_0    pypi
     pyhive                    0.6.1                    py36_0
     pynvml                    8.0.4                      py_0    conda-forge
     pyopenssl                 19.1.0                   py36_0
     pyparsing                 2.4.6                      py_0    conda-forge
     pyrsistent                0.15.4                   pypi_0    pypi
     pysocks                   1.7.1                    py36_0
     python                    3.6.7             h381d211_1004    conda-forge
     python-dateutil           2.8.0                    pypi_0    pypi
     python-editor             1.0.4                    pypi_0    pypi
     pytz                      2019.3                     py_0    conda-forge
     pyviz_comms               0.7.2                      py_0    conda-forge
     pywavelets                1.1.1            py36hc1659b7_0    conda-forge
     pyyaml                    5.1.2            py36h516909a_0    conda-forge
     pyzmq                     18.1.0                   pypi_0    pypi
     re2                       2020.01.01           he1b5a44_0    conda-forge
     readline                  7.0                  h7b6447c_5
     requests                  2.22.0                   py36_1
     rmm                       0.12.0a200115          py36_157    rapidsai-nightly
     ruamel_yaml               0.15.87          py36h7b6447c_0
     sasl                      0.2.1           py36he1b5a44_1001    conda-forge
     scikit-image              0.16.2           py36hb3f55d8_0    conda-forge
     scikit-learn              0.21.3           py36hcdab131_0    conda-forge
     scipy                     1.3.2            py36h921218d_0    conda-forge
     send2trash                1.5.0                      py_0    conda-forge
     setuptools                44.0.0                   py36_0
     six                       1.13.0                   py36_0
     snappy                    1.1.7             he1b5a44_1003    conda-forge
     sortedcontainers          2.1.0                      py_0    conda-forge
     sqlalchemy                1.3.8                    pypi_0    pypi
     sqlite                    3.30.1               h7b6447c_0
     tblib                     1.6.0                      py_0    conda-forge
     terminado                 0.8.2                    pypi_0    pypi
     testpath                  0.4.2                    pypi_0    pypi
     thrift                    0.11.0          py36he1b5a44_1001    conda-forge
     thrift-cpp                0.12.0            hf3afdfd_1004    conda-forge
     thrift_sasl               0.3.0           py36h516909a_1001    conda-forge
     tk                        8.6.9             hed695b0_1003    conda-forge
     toolz                     0.10.0                     py_0    conda-forge
     tornado                   5.1.1                    pypi_0    pypi
     tqdm                      4.41.1                     py_0
     traitlets                 4.3.2                    pypi_0    pypi
     ucx                       1.7.0dev+gee2252b    cuda10.0_119    rapidsai-nightly
     ucx-py                    0.12.0a200116+gd4ca5e3        py36_119    rapidsai-nightly
     uriparser                 0.9.3                he1b5a44_1    conda-forge
     urllib3                   1.25.7                   py36_0
     wcwidth                   0.1.7                    pypi_0    pypi
     webencodings              0.5.1                    pypi_0    pypi
     wheel                     0.33.6                   py36_0
     widgetsnbextension        3.4.2                    pypi_0    pypi
     xarray                    0.14.1                     py_1    anaconda
     xz                        5.2.4                h14c3975_4
     yaml                      0.1.7                had09818_2
     zeromq                    4.3.2                he1b5a44_2    conda-forge
     zict                      1.0.0                      py_0    conda-forge
     zipp                      2.0.0                      py_1    conda-forge
     zlib                      1.2.11               h7b6447c_3
     zstd                      1.4.3                h3b9ef0a_0    conda-forge

Additional context Notebook that reproduces the issue tests on columns of the following dtypes: str, int, float, varchar

gumdropsteve commented 4 years ago

Temp Fix

This issue relates to TIMESTAMP expecting float values (aka "millisecond resolution").

Where I was passing in;

| tpep_pickup_datetime
-- | --
2015-01-15 19:05:39
2015-01-10 20:33:38
2015-01-10 20:33:39

which specifies the times down to the second, if I instead pass in;

| tpep_pickup_datetime
-- | --
2015-01-15 19:05:39.0
2015-01-10 20:33:38.0
2015-01-10 20:33:39.0

which specifies the times down to the millisecond, CASTing to TIMESTAMP works as expected.

This can be achieved by having default values (in your file / table) that specify down to the millisecond via decimal or by CONCATing || decimals to your values before CASTing. Here's an example concatenating a simple decimal || '.' and a decimal with minimal value impact || '.0001' before CASTing.

image

NOTE: https://gist.github.com/gumdropsteve/48af7ec54c9e7cf52ded504e2f2a2dfb#gistcomment-3152873

wmalpica commented 4 years ago

A related issue is that if you try to cast a string such as 2020-02-18 to TIMESTAMP it will also return 1970-01-01. To get around that you can do CAST( CAST column AS DATE) AS TIMESTAMP)

wmalpica commented 4 years ago

This will be addressed once the cudf timestamp and duration datatypes are more thoroughly implemented and also when TO_DATE is implemented in SQL