BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.93k stars 183 forks source link

[BUG] Resulting Column Name as Alias Not Applying #120

Closed gumdropsteve closed 4 years ago

gumdropsteve commented 4 years ago

Describe the bug

Provided alias column names are not applying to columns of query results which are instead generically titled $f0, $f1 ... $fn.

Context After creating a table ("taxi"), I'm trying to:

  1. extract hour, month, and year from each row of a datetime column (key) with each being a new column titled hours, months, and years (respectively)
  2. find the difference between 2 columns with dropoff and pickup longitude as a new column longitude_distance
  3. find the difference between 2 columns with dropoff and pickup latitudes as a new column latitude_distance

but the new time column names (hours, months, years) are being output as $f0, $f1 and $f2, and the distance column names (longitude_distance, latitude_distance) are being output as $f3 and $f4.

Here's the query and execution:

# define the query
query = '''
        SELECT hour(key) as hours, month(key) as months, year(key) - 2000 as years,  
        dropoff_longitude - pickup_longitude as longitude_distance, 
        dropoff_latitude - pickup_latitude as latitude_distance, 
        passenger_count FROM main.taxi
        '''

# run query on table
X_train = bc.sql(query).get()

# extract dataframe
X_train_gdf = X_train.columns

# how's that look?
X_train_gdf.head()

Here's the current output: image

Steps/Code to reproduce bug

Here's the notebook in Colab: https://colab.research.google.com/drive/1gEX0CrTMNLu5Y4V4JbLw5HAQm6UHgtAr

Dataframe with incorrect column names output is displayed and can be reproduced by downloading and running the notebook locally. Currently there is a manual correction fix in place.

Expected behavior

image

Environment overview

Environment details

<details><summary>Click here to see environment details</summary><pre>

     **git***
     commit 7a45cd317eb341cfa8693f5377ed1c7052a6eaee (HEAD -> feature_taxi, origin/feature_taxi)
     Author: Winston <warobson@gmail.com>
     Date:   Mon Oct 28 03:45:16 2019 -0700

     formatted and runs e2e with temp fixes (issue: column names)
     **git submodules***

     ***OS Information***
     DISTRIB_ID=Ubuntu
     DISTRIB_RELEASE=16.04
     DISTRIB_CODENAME=xenial
     DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
     NAME="Ubuntu"
     VERSION="16.04.6 LTS (Xenial Xerus)"
     ID=ubuntu
     ID_LIKE=debian
     PRETTY_NAME="Ubuntu 16.04.6 LTS"
     VERSION_ID="16.04"
     HOME_URL="http://www.ubuntu.com/"
     SUPPORT_URL="http://help.ubuntu.com/"
     BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
     VERSION_CODENAME=xenial
     UBUNTU_CODENAME=xenial
     Linux winston-gpu-rig 4.15.0-1047-gcp #50-Ubuntu SMP Wed Oct 2 00:50:34 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

     ***GPU Information***
     Sun Nov  3 06:40:18 2019
     +-----------------------------------------------------------------------------+
     | NVIDIA-SMI 418.87.00    Driver Version: 418.87.00    CUDA Version: 10.1     |
     |-------------------------------+----------------------+----------------------+
     | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     |===============================+======================+======================|
     |   0  Tesla T4            Off  | 00000000:00:04.0 Off |                    0 |
     | N/A   42C    P8    10W /  70W |     98MiB / 15079MiB |      0%      Default |
     +-------------------------------+----------------------+----------------------+

     +-----------------------------------------------------------------------------+
     | Processes:                                                       GPU Memory |
     |  GPU       PID   Type   Process name                             Usage      |
     |=============================================================================|
     |    0      2014      G   /usr/lib/xorg/Xorg                            98MiB |
     +-----------------------------------------------------------------------------+

     ***CPU***
     Architecture:          x86_64
     CPU op-mode(s):        32-bit, 64-bit
     Byte Order:            Little Endian
     CPU(s):                4
     On-line CPU(s) list:   0-3
     Thread(s) per core:    2
     Core(s) per socket:    2
     Socket(s):             1
     NUMA node(s):          1
     Vendor ID:             GenuineIntel
     CPU family:            6
     Model:                 63
     Model name:            Intel(R) Xeon(R) CPU @ 2.30GHz
     Stepping:              0
     CPU MHz:               2300.000
     BogoMIPS:              4600.00
     Hypervisor vendor:     KVM
     Virtualization type:   full
     L1d cache:             32K
     L1i cache:             32K
     L2 cache:              256K
     L3 cache:              46080K
     NUMA node0 CPU(s):     0-3
     Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm invpcid_single pti ssbd ibrs ibpb stibp fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt arat md_clear arch_capabilities

     ***CMake***
     /home/winston/miniconda3/envs/bzsqlenv/bin/cmake
     cmake version 3.15.5

     CMake suite maintained and supported by Kitware (kitware.com/cmake).

     ***g++***
     /usr/bin/g++
     g++ (Ubuntu 5.4.0-6ubuntu1~16.04.11) 5.4.0 20160609
     Copyright (C) 2015 Free Software Foundation, Inc.
     This is free software; see the source for copying conditions.  There is NO
     warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

     ***nvcc***

     ***Python***
     /home/winston/miniconda3/envs/bzsqlenv/bin/python
     Python 3.7.3

     ***Environment Variables***
     PATH                            : /home/winston/bin:/home/winston/.local/bin:/home/winston/miniconda3/envs/bzsqlenv/bin:/home/winston/miniconda3/condabin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
     LD_LIBRARY_PATH                 :
     NUMBAPRO_NVVM                   :
     NUMBAPRO_LIBDEVICE              :
     CONDA_PREFIX                    : /home/winston/miniconda3/envs/bzsqlenv
     PYTHON_PATH                     :

     ***conda packages***
     /home/winston/miniconda3/condabin/conda
     # packages in environment at /home/winston/miniconda3/envs/bzsqlenv:
     #
     # Name                    Version                   Build  Channel
     _libgcc_mutex             0.1                        main
     alsa-lib                  1.1.5             h516909a_1001    conda-forge
     arrow-cpp                 0.14.1           py37h5ac5442_4    conda-forge
     attrs                     19.3.0                   pypi_0    pypi
     backcall                  0.1.0                    pypi_0    pypi
     blazingsql-calcite        0.4.5                         0    blazingsql
     blazingsql-communication  0.4.5                cuda10.0_0    blazingsql/label/cuda10.0
     blazingsql-io             0.4.4                         0    blazingsql
     blazingsql-orchestrator   0.4.5                         0    blazingsql
     blazingsql-protocol       0.4.5                    py37_0    blazingsql
     blazingsql-python         0.4.5           cuda10.0_py37_0    blazingsql/label/cuda10.0
     blazingsql-ral            0.4.5                cuda10.0_0    blazingsql/label/cuda10.0
     blazingsql-toolchain      0.4.5                         3    blazingsql
     bleach                    3.1.0                    pypi_0    pypi
     bokeh                     1.3.4                    py37_0    conda-forge
     boost                     1.70.0           py37h9de70de_1    conda-forge
     boost-cpp                 1.70.0               h8e57a91_2    conda-forge
     brotli                    1.0.7             he1b5a44_1000    conda-forge
     bzip2                     1.0.8                h516909a_1    conda-forge
     c-ares                    1.15.0            h516909a_1001    conda-forge
     ca-certificates           2019.9.11            hecc5488_0    conda-forge
     certifi                   2019.9.11                py37_0    conda-forge
     click                     7.0                        py_0    conda-forge
     cloudpickle               1.2.2                      py_0    conda-forge
     cmake                     3.15.5               hf94ab9c_0    conda-forge
     cppzmq                    4.4.1                hc9558a2_0    conda-forge
     cudatoolkit               10.0.130                      0
     cudf                      0.10.0                   py37_0    rapidsai
     curl                      7.65.3               hf8cf82a_0    conda-forge
     cython                    0.29.13          py37he1b5a44_0    conda-forge
     cytoolz                   0.10.0           py37h516909a_0    conda-forge
     dask                      2.6.0                      py_0    conda-forge
     dask-core                 2.6.0                      py_0    conda-forge
     dask-cudf                 0.10.0                   py37_0    rapidsai
     decorator                 4.4.1                    pypi_0    pypi
     defusedxml                0.6.0                    pypi_0    pypi
     distributed               2.6.0                      py_0    conda-forge
     dlpack                    0.2                  he1b5a44_1    conda-forge
     double-conversion         3.1.5                he1b5a44_2    conda-forge
     entrypoints               0.3                      pypi_0    pypi
     expat                     2.2.5             he1b5a44_1004    conda-forge
     fastavro                  0.22.5           py37h516909a_0    conda-forge
     flatbuffers               1.11                     pypi_0    pypi
     fontconfig                2.13.1            h86ecdb6_1001    conda-forge
     freetype                  2.10.0               he983fc9_1    conda-forge
     fsspec                    0.5.2                      py_0    conda-forge
     gettext                   0.19.8.1          hc5be6a0_1002    conda-forge
     gflags                    2.2.2             he1b5a44_1002    conda-forge
     giflib                    5.1.7                h516909a_1    conda-forge
     glog                      0.4.0                he1b5a44_1    conda-forge
     gmock                     1.10.0                        0    conda-forge
     grpc-cpp                  1.23.0               h18db393_0    conda-forge
     gtest                     1.10.0               hc9558a2_0    conda-forge
     heapdict                  1.0.1                      py_0    conda-forge
     icu                       64.2                 he1b5a44_1    conda-forge
     importlib-metadata        0.23                     pypi_0    pypi
     ipykernel                 5.1.3                    pypi_0    pypi
     ipython                   7.9.0                    pypi_0    pypi
     ipython-genutils          0.2.0                    pypi_0    pypi
     jedi                      0.15.1                   pypi_0    pypi
     jinja2                    2.10.3                     py_0    conda-forge
     jpeg                      9c                h14c3975_1001    conda-forge
     json5                     0.8.5                    pypi_0    pypi
     jsonschema                3.1.1                    pypi_0    pypi
     jupyter-client            5.3.4                    pypi_0    pypi
     jupyter-core              4.6.1                    pypi_0    pypi
     jupyterlab                0.34.0                   pypi_0    pypi
     jupyterlab-launcher       0.13.1                   pypi_0    pypi
     jupyterlab-server         1.0.6                    pypi_0    pypi
     krb5                      1.16.3            h05b26f9_1001    conda-forge
     lcms2                     2.9                  h2e4bb80_0    conda-forge
     libblas                   3.8.0               14_openblas    conda-forge
     libcblas                  3.8.0               14_openblas    conda-forge
     libcudf                   0.10.0               cuda10.0_0    rapidsai
     libcurl                   7.65.3               hda55be3_0    conda-forge
     libedit                   3.1.20181209         hc058e9b_0
     libevent                  2.1.10               h72c5cf5_0    conda-forge
     libffi                    3.2.1                hd88cf55_4
     libgcc-ng                 9.1.0                hdf63c60_0
     libgcrypt                 1.8.4             hf484d3e_1000    conda-forge
     libgfortran-ng            7.3.0                hdf63c60_2    conda-forge
     libgpg-error              1.36                 he1b5a44_0    conda-forge
     libgsasl                  1.8.0             h19a2143_1004    conda-forge
     libhdfs3                  2.3               h311b756_1006    conda-forge
     libiconv                  1.15              h516909a_1005    conda-forge
     liblapack                 3.8.0               14_openblas    conda-forge
     libllvm8                  8.0.1                hc9558a2_0    conda-forge
     libntlm                   1.4               h14c3975_1002    conda-forge
     libnvstrings              0.10.0               cuda10.0_0    rapidsai
     libopenblas               0.3.7                h6e990d7_2    conda-forge
     libpng                    1.6.37               hed695b0_0    conda-forge
     libprotobuf               3.8.0                h8b12597_0    conda-forge
     librmm                    0.10.0               cuda10.0_0    rapidsai
     libsodium                 1.0.17               h516909a_0    conda-forge
     libssh2                   1.8.2                h22169c7_2    conda-forge
     libstdcxx-ng              9.1.0                hdf63c60_0
     libtiff                   4.0.10            h57b8799_1003    conda-forge
     libuuid                   2.32.1            h14c3975_1000    conda-forge
     libuv                     1.33.1               h516909a_0    conda-forge
     libxcb                    1.13              h14c3975_1002    conda-forge
     libxml2                   2.9.10               hee79883_0    conda-forge
     llvmlite                  0.30.0           py37h8b12597_0    conda-forge
     locket                    0.2.0                      py_2    conda-forge
     lz4-c                     1.8.3             he1b5a44_1001    conda-forge
     markupsafe                1.1.1            py37h14c3975_0    conda-forge
     maven                     3.6.0                         0    conda-forge
     mistune                   0.8.4                    pypi_0    pypi
     more-itertools            7.2.0                    pypi_0    pypi
     msgpack-python            0.6.2            py37hc9558a2_0    conda-forge
     nbconvert                 5.6.1                    pypi_0    pypi
     nbformat                  4.4.0                    pypi_0    pypi
     ncurses                   6.1                  he6710b0_1
     notebook                  6.0.1                    pypi_0    pypi
     numba                     0.46.0           py37hb3f55d8_1    conda-forge
     numpy                     1.17.3           py37h95a1406_0    conda-forge
     nvstrings                 0.10.0                   py37_0    rapidsai
     olefile                   0.46                       py_0    conda-forge
     openjdk                   11.0.1            h46a85a0_1017    conda-forge
     openssl                   1.1.1c               h516909a_0    conda-forge
     packaging                 19.2                       py_0    conda-forge
     pandas                    0.24.2           py37hb3f55d8_0    conda-forge
     pandocfilters             1.4.2                    pypi_0    pypi
     parquet-cpp               1.5.1                         2    conda-forge
     parso                     0.5.1                    pypi_0    pypi
     partd                     1.0.0                      py_0    conda-forge
     pexpect                   4.7.0                    pypi_0    pypi
     pickleshare               0.7.5                    pypi_0    pypi
     pillow                    6.2.1            py37h6b7be26_0    conda-forge
     pip                       19.3.1                   py37_0
     prometheus-client         0.7.1                    pypi_0    pypi
     prompt-toolkit            2.0.10                   pypi_0    pypi
     psutil                    5.6.3            py37h516909a_0    conda-forge
     pthread-stubs             0.4               h14c3975_1001    conda-forge
     ptyprocess                0.6.0                    pypi_0    pypi
     pyarrow                   0.14.1           py37h8b68381_2    conda-forge
     pygments                  2.4.2                    pypi_0    pypi
     pyparsing                 2.4.2                      py_0    conda-forge
     pyrsistent                0.15.5                   pypi_0    pypi
     python                    3.7.3                h5b0a415_0    conda-forge
     python-dateutil           2.8.0                      py_0    conda-forge
     pytz                      2019.3                     py_0    conda-forge
     pyyaml                    5.1.2            py37h516909a_0    conda-forge
     pyzmq                     18.1.0                   pypi_0    pypi
     rapidjson                 1.1.0             he1b5a44_1002    conda-forge
     re2                       2019.09.01           he1b5a44_0    conda-forge
     readline                  7.0                  h7b6447c_5
     rhash                     1.3.6             h14c3975_1001    conda-forge
     rmm                       0.10.0                   py37_0    rapidsai
     send2trash                1.5.0                    pypi_0    pypi
     setuptools                41.6.0                   py37_0
     six                       1.12.0                py37_1000    conda-forge
     snappy                    1.1.7             he1b5a44_1002    conda-forge
     sortedcontainers          2.1.0                      py_0    conda-forge
     sqlite                    3.30.1               h7b6447c_0
     tblib                     1.4.0                      py_0    conda-forge
     terminado                 0.8.2                    pypi_0    pypi
     testpath                  0.4.2                    pypi_0    pypi
     thrift-cpp                0.12.0            hf3afdfd_1004    conda-forge
     tk                        8.6.9             hed695b0_1003    conda-forge
     toolz                     0.10.0                     py_0    conda-forge
     tornado                   6.0.3            py37h516909a_0    conda-forge
     traitlets                 4.3.3                    pypi_0    pypi
     uriparser                 0.9.3                he1b5a44_1    conda-forge
     wcwidth                   0.1.7                    pypi_0    pypi
     webencodings              0.5.1                    pypi_0    pypi
     wheel                     0.33.6                   py37_0
     xorg-fixesproto           5.0               h14c3975_1002    conda-forge
     xorg-inputproto           2.3.2             h14c3975_1002    conda-forge
     xorg-kbproto              1.0.7             h14c3975_1002    conda-forge
     xorg-libx11               1.6.9                h516909a_0    conda-forge
     xorg-libxau               1.0.9                h14c3975_0    conda-forge
     xorg-libxdmcp             1.1.3                h516909a_0    conda-forge
     xorg-libxext              1.3.4                h516909a_0    conda-forge
     xorg-libxfixes            5.0.3             h516909a_1004    conda-forge
     xorg-libxi                1.7.10               h516909a_0    conda-forge
     xorg-libxrender           0.9.10            h516909a_1002    conda-forge
     xorg-libxtst              1.2.3             h14c3975_1002    conda-forge
     xorg-recordproto          1.14.2            h516909a_1002    conda-forge
     xorg-renderproto          0.11.1            h14c3975_1002    conda-forge
     xorg-xextproto            7.3.0             h14c3975_1002    conda-forge
     xorg-xproto               7.0.31            h14c3975_1007    conda-forge
     xz                        5.2.4                h14c3975_4
     yaml                      0.1.7             h14c3975_1001    conda-forge
     zeromq                    4.3.2                he1b5a44_2    conda-forge
     zict                      1.0.0                      py_0    conda-forge
     zipp                      0.6.0                    pypi_0    pypi
     zlib                      1.2.11               h7b6447c_3
     zstd                      1.4.0                h3b9ef0a_0    conda-forge

</pre></details>
felipeblazing commented 4 years ago

We are currently looking into this issue. It seems to be an issue with the relational algebra that is being generated. Will update here with information about the PR for this

gumdropsteve commented 4 years ago

EDIT: just tried without messing with key and without feeding BlazingContext the column names, both had same result.

Update (Additional Info)

@felipeblazing I was able to run the query seen in this tweet in the same environment: https://twitter.com/blazingsql/status/1192163166580432897

image

Thoughts

Might be more to do with extracting values from datetime column key? Or maybe that the table in this query is originated in BlazingContext rather than cudf? Hope this is useful. Thanks.

felipeblazing commented 4 years ago

Seems to be an issue with calcite. We need to push a fix. Will assign someone tomorrow

On Wed, Nov 6, 2019, 7:23 PM Winston notifications@github.com wrote:

Update (Additional Info)

@felipeblazing https://github.com/felipeblazing I was able to run the query seen in this tweet in the same environment: https://twitter.com/blazingsql/status/1192163166580432897

[image: image] https://user-images.githubusercontent.com/43570913/68349329-ab37e580-00b1-11ea-8136-b2542362b70f.png Thoughts

Might be more to do with extracting values from datetime column key? Or maybe that the table in this query is originated in BlazingContext rather than cudf? Hope this is useful. Thanks.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/BlazingDB/pyBlazing/issues/120?email_source=notifications&email_token=AD7LELOHSVEQLWAHEPLSDUDQSNNZZA5CNFSM4JIJ7XB2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEDIOXOA#issuecomment-550562744, or unsubscribe https://github.com/notifications/unsubscribe-auth/AD7LELIGPA74DFGOCCE63XLQSNNZZANCNFSM4JIJ7XBQ .

aucahuasi commented 4 years ago

Thanks @gumdropsteve! @rommelDB is working on this one!

rommelDB commented 4 years ago

Hey @gumdropsteve, thanks for the report. This PR https://github.com/BlazingDB/blazingdb-calcite/pull/41 fixes such an issue. It will be merged to develop soon! cc @aucahuasi