BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
Apache License 2.0
1.92k stars 181 forks

[BUG] KeyError 26 in type lookup table, reading DECIMAL columns #1525

Open lucharo opened 3 years ago

lucharo commented 3 years ago

Describe the bug I am trying to read some parquet files that were created with hive (STORED as PARQUET) and I get an error saying KeyError: 26. I've enabled the pdb and realised that this key is missing in cudfTypeToCsvType (see image below)


As you can see below the key 26 corresponds to the Hive type DECIMAL, what can I do in these situations?


Steps/Code to reproduce bug

Expected behavior The DECIMAL type to be handle properly or at least to fallback to another lookup value

Environment overview (please complete the following information)

BlazingSQL version (git hash): ff4ece0366a4d76bf533baeb03dd03bdfc5232be
BlazingSQL branch name: HEAD
BlazingSQL branch tag: v0.19.0
BlazingSQL build id: 0
BlazingSQL compiler version: GNU /usr/bin/c++ 7.5.0
BlazingSQL cuda flags: -Xcompiler -Wno-parentheses -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 --expt-extended-lambda --expt-relaxed-constexpr -Werror=cross-execution-space-call -Xcompiler -Wall,-Wno-error=deprecated-declarations --default-stream=per-thread -DHT_DEFAULT_ALLOCATOR
BlazingSQL Operating system kernel: Linux-5.4.0-1038-aws
BlazingSQL Operating system architecture: x86_64
BlazingSQL Linux Operating system release: NAME=Ubuntu|VERSION=16.04.7 LTS (Xenial Xerus)|ID=ubuntu|ID_LIKE=debian|PRETTY_NAME=Ubuntu 16.04.7 LTS|VERSION_ID=16.04|HOME_URL=|SUPPORT_URL=|BUG_REPORT_URL=|VERSION_CODENAME=xenial|UBUNTU_CODENAME=xenial

conda list output ```txt (iblazing) chavesrl@gpu-0 ~/GPURnD/BSQL └─ $ ▶ conda list # packages in environment at /projects/gds/chavesrl/condapv/envs/iblazing: # # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge abseil-cpp 20210324.0 h9c3ff4c_0 conda-forge alsa-lib 1.2.3 h516909a_0 conda-forge argon2-cffi 20.1.0 py37h5e8e339_2 conda-forge arrow-cpp 1.0.1 py37h363ccdf_36_cuda conda-forge arrow-cpp-proc 3.0.0 cuda conda-forge async_generator 1.10 py_0 conda-forge attrs 20.3.0 pyhd3deb0d_0 conda-forge aws-c-cal 0.4.5 h5ca8eb3_9 conda-forge aws-c-common 0.5.8 h7f98852_0 conda-forge aws-c-event-stream 0.2.7 h832d4c5_2 conda-forge aws-c-io 0.9.4 hfee9f7d_5 conda-forge aws-checksums 0.1.11 hdc257ea_4 conda-forge aws-sdk-cpp 1.8.151 h973185b_2 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 py_2 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge blazingsql 0.19.0 pypi_0 pypi bleach 3.3.0 pyh44b312d_0 conda-forge bokeh 2.3.1 py37h89c1867_0 conda-forge boost-cpp 1.72.0 h9d3c048_4 conda-forge brotli 1.0.9 h9c3ff4c_4 conda-forge brotlipy 0.7.0 py37h5e8e339_1001 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.17.1 h7f98852_1 conda-forge ca-certificates 2020.12.5 ha878542_0 conda-forge cachetools 4.2.2 pyhd8ed1ab_0 conda-forge cairo 1.16.0 h6cf1ce9_1008 conda-forge certifi 2020.12.5 py37h89c1867_1 conda-forge cffi 1.14.5 py37hc58025e_0 conda-forge chardet 4.0.0 py37h89c1867_1 conda-forge click 7.1.2 pyh9f0ad1d_0 conda-forge cloudpickle 1.6.0 py_0 conda-forge conda 4.10.1 py37h89c1867_0 conda-forge conda-package-handling 1.7.3 py37h5e8e339_0 conda-forge cryptography 3.4.7 py37h5d9358c_0 conda-forge cudatoolkit 10.1.243 h036e899_8 nvidia-remote cudf 0.19.2 cuda_10.1_py37_gab3b3f653a_0 rapidsai-remote cudnn 7.6.0 cuda10.1_0 nvidia-remote cupy 8.0.0 py37h0632833_0 conda-forge cyrus-sasl 2.1.27 h3274739_1 conda-forge cytoolz 0.11.0 py37h5e8e339_3 conda-forge dask 2021.4.0 pyhd8ed1ab_0 conda-forge dask-core 2021.4.0 pyhd8ed1ab_0 conda-forge dask-cuda 0.19.0 py37_0 rapidsai-remote dask-cudf 0.19.2 py37_gab3b3f653a_0 rapidsai-remote decorator 5.0.7 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge distributed 2021.4.0 py37h89c1867_0 conda-forge dlpack 0.3 he1b5a44_1 conda-forge entrypoints 0.3 pyhd8ed1ab_1003 conda-forge fastavro 1.4.0 py37h5e8e339_0 conda-forge fastrlock 0.6 py37hcd2ae1e_0 conda-forge fontconfig 2.13.1 hba837de_1005 conda-forge freetype 2.10.4 h0708190_1 conda-forge fsspec 2021.4.0 pyhd8ed1ab_0 conda-forge future 0.18.2 py37h89c1867_3 conda-forge gettext h0b5b191_1005 conda-forge gflags 2.2.2 he1b5a44_1004 conda-forge giflib 5.2.1 h36c2ea0_2 conda-forge glog 0.4.0 h49b9bf7_3 conda-forge google-cloud-cpp 1.25.0 hc9b7cee_2 conda-forge graphite2 1.3.13 h58526e2_1001 conda-forge greenlet 1.0.0 py37hcd2ae1e_0 conda-forge grpc-cpp 1.37.1 h36de60a_0 conda-forge harfbuzz 2.8.1 h83ec7ef_0 conda-forge heapdict 1.0.1 py_0 conda-forge icu 68.1 h58526e2_0 conda-forge idna 2.10 pyh9f0ad1d_0 conda-forge importlib-metadata 4.0.1 py37h89c1867_0 conda-forge ipykernel 5.5.4 py37h085eea5_0 conda-forge ipython 7.23.1 py37h085eea5_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.6.3 pyhd3deb0d_0 conda-forge jedi 0.18.0 py37h89c1867_2 conda-forge jinja2 2.11.3 pyh44b312d_0 conda-forge jpeg 9d h36c2ea0_0 conda-forge jpype1 1.2.1 py37h2527ec5_0 conda-forge jsonschema 3.2.0 pyhd8ed1ab_3 conda-forge jupyter_client 6.1.12 pyhd8ed1ab_0 conda-forge jupyter_core 4.7.1 py37h89c1867_0 conda-forge jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge jupyterlab_widgets 1.0.0 pyhd8ed1ab_1 conda-forge krb5 1.17.2 h926e7f8_0 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.35.1 hea4e1c9_2 conda-forge libarchive 3.5.1 h3f442fb_1 conda-forge libblas 3.9.0 9_openblas conda-forge libcblas 3.9.0 9_openblas conda-forge libcrc32c 1.1.1 h9c3ff4c_2 conda-forge libcudf 0.19.2 cuda10.1_gab3b3f653a_0 rapidsai-remote libcurl 7.76.1 hc4aaa36_1 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libevent 2.1.10 hcdb4288_3 conda-forge libffi 3.3 h58526e2_2 conda-forge libgcc-ng 9.3.0 h2828fa1_19 conda-forge libgfortran-ng 9.3.0 hff62375_19 conda-forge libgfortran5 9.3.0 hff62375_19 conda-forge libglib 2.68.1 h3e27bee_0 conda-forge libgomp 9.3.0 h2828fa1_19 conda-forge libhwloc 2.3.0 h5e5b7d1_1 conda-forge libiconv 1.16 h516909a_0 conda-forge liblapack 3.9.0 9_openblas conda-forge libllvm10 10.0.1 he513fc3_3 conda-forge libnghttp2 1.43.0 h812cca2_0 conda-forge libntlm 1.4 h7f98852_1002 conda-forge libopenblas 0.3.15 pthreads_h8fe5266_0 conda-forge libpng 1.6.37 h21135ba_2 conda-forge libpq 13.2 hfd2b0eb_2 conda-forge libprotobuf 3.15.8 h780b84a_0 conda-forge librmm 0.19.0 cuda10.1_g7065af3_0 rapidsai-remote libsodium 1.0.18 h36c2ea0_1 conda-forge libsolv 0.7.18 h780b84a_0 conda-forge libssh2 1.9.0 ha56f1ee_6 conda-forge libstdcxx-ng 9.3.0 h6de172a_19 conda-forge libthrift 0.14.1 he6d91bd_1 conda-forge libtiff 4.2.0 hdc55705_1 conda-forge libutf8proc 2.6.1 h7f98852_0 conda-forge libuuid 2.32.1 h7f98852_1000 conda-forge libwebp-base 1.2.0 h7f98852_2 conda-forge libxcb 1.13 h7f98852_1003 conda-forge libxml2 2.9.10 h72842e0_4 conda-forge llvmlite 0.36.0 py37h9d7f4d0_0 conda-forge locket 0.2.0 py_2 conda-forge lz4-c 1.9.3 h9c3ff4c_0 conda-forge lzo 2.10 h516909a_1000 conda-forge mamba 0.12.2 py37h7f483ca_0 conda-forge markupsafe 1.1.1 py37h5e8e339_3 conda-forge matplotlib-inline 0.1.2 pyhd8ed1ab_2 conda-forge mistune 0.8.4 py37h5e8e339_1003 conda-forge msgpack-python 1.0.2 py37h2527ec5_1 conda-forge nbclient 0.5.3 pyhd8ed1ab_0 conda-forge nbconvert 6.0.7 py37h89c1867_3 conda-forge nbformat 5.1.3 pyhd8ed1ab_0 conda-forge nccl cuda10.1_0 nvidia-remote ncurses 6.2 h58526e2_4 conda-forge nest-asyncio 1.5.1 pyhd8ed1ab_0 conda-forge netifaces 0.10.9 py37h5e8e339_1003 conda-forge nlohmann_json 3.9.1 h9c3ff4c_1 conda-forge notebook 6.3.0 pyha770c72_1 conda-forge numba 0.53.1 py37h134767a_0 conda-forge numpy 1.20.2 py37h038b26d_0 conda-forge nvtx 0.2.3 py37h5e8e339_0 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openjdk h5cc2fde_1 conda-forge openjpeg 2.4.0 hf7af979_0 conda-forge openssl 1.1.1k h7f98852_0 conda-forge orc 1.6.7 heec2584_1 conda-forge packaging 20.9 pyh44b312d_0 conda-forge pandas 1.2.4 py37h219a48f_0 conda-forge pandoc 2.12 h7f98852_0 conda-forge pandocfilters 1.4.2 py_1 conda-forge parquet-cpp 1.5.1 2 conda-forge parso 0.8.2 pyhd8ed1ab_0 conda-forge partd 1.2.0 pyhd8ed1ab_0 conda-forge pcre 8.44 he1b5a44_0 conda-forge pexpect 4.8.0 pyh9f0ad1d_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 8.1.2 py37h4600e1f_1 conda-forge pip 21.1.1 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h36c2ea0_0 conda-forge prometheus_client 0.10.1 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.18 pyha770c72_0 conda-forge protobuf 3.15.8 py37hcd2ae1e_0 conda-forge psutil 5.8.0 py37h5e8e339_1 conda-forge pthread-stubs 0.4 h36c2ea0_1001 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pyarrow 1.0.1 py37hb63ea2f_36_cuda conda-forge pycosat 0.6.3 py37h5e8e339_1006 conda-forge pycparser 2.20 pyh9f0ad1d_2 conda-forge pygments 2.9.0 pyhd8ed1ab_0 conda-forge pyhive 0.6.3 pyhd3deb0d_0 conda-forge pynvml 8.0.4 py_1 conda-forge pyopenssl 20.0.1 pyhd8ed1ab_0 conda-forge pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pyrsistent 0.17.3 py37h5e8e339_2 conda-forge pysocks 1.7.1 py37h89c1867_3 conda-forge python 3.7.10 hffdb5ce_100_cpython conda-forge python-dateutil 2.8.1 py_0 conda-forge python_abi 3.7 1_cp37m conda-forge pytz 2021.1 pyhd8ed1ab_0 conda-forge pyyaml 5.4.1 py37h5e8e339_0 conda-forge pyzmq 22.0.3 py37h336d617_1 conda-forge re2 2021.04.01 h9c3ff4c_0 conda-forge readline 8.1 h46c0cb4_0 conda-forge reproc 14.2.1 h36c2ea0_0 conda-forge reproc-cpp 14.2.1 h58526e2_0 conda-forge requests 2.25.1 pyhd3deb0d_0 conda-forge rmm 0.19.0 cuda_10.1_py37_g7065af3_0 rapidsai-remote ruamel_yaml 0.15.80 py37h5e8e339_1004 conda-forge s2n 1.0.5 h9b69904_0 conda-forge sasl 0.2.1 py37h3340039_1002 conda-forge send2trash 1.5.0 py_0 conda-forge setuptools 49.6.0 py37h89c1867_3 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.1.8 he1b5a44_3 conda-forge sortedcontainers 2.3.0 pyhd8ed1ab_0 conda-forge spdlog 1.7.0 hc9558a2_2 conda-forge sqlalchemy 1.4.13 py37h5e8e339_0 conda-forge sqlite 3.35.5 h74cdb3f_0 conda-forge tblib 1.7.0 pyhd8ed1ab_0 conda-forge terminado 0.9.4 py37h89c1867_0 conda-forge testpath 0.4.4 py_0 conda-forge thrift 0.13.0 py37hcd2ae1e_2 conda-forge thrift_sasl 0.4.2 py37h8f50634_0 conda-forge tk 8.6.10 h21135ba_1 conda-forge toolz 0.11.1 py_0 conda-forge tornado 6.1 py37h5e8e339_1 conda-forge tqdm 4.60.0 pyhd8ed1ab_0 conda-forge traitlets 5.0.5 py_0 conda-forge typing_extensions py_0 conda-forge ucx 1.9.0+gcd9efd3 cuda10.1_0 rapidsai-remote ucx-proc 1.0.0 gpu rapidsai-remote ucx-py 0.19.0 py37_gcd9efd3_0 rapidsai-remote urllib3 1.26.4 pyhd8ed1ab_0 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge webencodings 0.5.1 py_1 conda-forge wheel 0.36.2 pyhd3deb0d_0 conda-forge widgetsnbextension 3.5.1 py37h89c1867_4 conda-forge xorg-fixesproto 5.0 h7f98852_1002 conda-forge xorg-inputproto 2.3.2 h7f98852_1002 conda-forge xorg-kbproto 1.0.7 h7f98852_1002 conda-forge xorg-libice 1.0.10 h7f98852_0 conda-forge xorg-libsm 1.2.3 hd9c2040_1000 conda-forge xorg-libx11 1.6.12 h516909a_0 conda-forge xorg-libxau 1.0.9 h7f98852_0 conda-forge xorg-libxdmcp 1.1.3 h7f98852_0 conda-forge xorg-libxext 1.3.4 h516909a_0 conda-forge xorg-libxfixes 5.0.3 h516909a_1004 conda-forge xorg-libxi 1.7.10 h516909a_0 conda-forge xorg-libxrender 0.9.10 h516909a_1002 conda-forge xorg-libxtst 1.2.3 h516909a_1002 conda-forge xorg-recordproto 1.14.2 h516909a_1002 conda-forge xorg-renderproto 0.11.1 h7f98852_1002 conda-forge xorg-xextproto 7.3.0 h7f98852_1002 conda-forge xorg-xproto 7.0.31 h7f98852_1007 conda-forge xz 5.2.5 h516909a_1 conda-forge yaml 0.2.5 h516909a_0 conda-forge zeromq 4.3.4 h9c3ff4c_0 conda-forge zict 2.0.0 py_0 conda-forge zipp 3.4.1 pyhd8ed1ab_0 conda-forge zlib 1.2.11 h516909a_1010 conda-forge zstd 1.4.9 ha95c52a_0 conda-forge ```

Additional context Add any other context about the problem here.

----For BlazingSQL Developers---- Suspected source of the issue Where and what are potential sources of the issue

Other design considerations What components of the engine could be affected by this?

lucharo commented 3 years ago

Managed to temporarily fix this by changing the base table to use DOUBLE instead of DECIMAL to store float values. It would be nice if DECIMAL type is supported or for BSQL to fallback to one of the supported float types by default, or to at least let the user know that the DECIMAL hive type is not supported instead of getting the KeyError: 26 error message

wmalpica commented 3 years ago

@lucharo BSQL currently does not support DECIMAL. Its on our roadmap. Until we do, can address this issue for expanding that cudfTypeToCsvType mapping to include DECIMAL, and we would have to modify our file readers, so that if they read a DECIMAL column, it casts it to a double.

Christian8491 commented 3 years ago

@lucharo to let you know, PR #1550 was already merged. So you can try again. Please note the comments in that PR about you can or can't do with DECIMAL type.

lucharo commented 3 years ago

Thanks for this too! Was #1550 merged in bsql-0.20?

Christian8491 commented 3 years ago

@lucharo, #1550 was merged into branch-21.06