BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.93k stars 183 forks source link

[BUG] byte_range offset with header not supported #1604

Open marberi opened 2 years ago

marberi commented 2 years ago

Describe the bug Problem reading in csv file. It generate the following error:

BlazingContext ready [19:37:44.175] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:44.175] [error] |||ERROR in graph::execute. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:44.258] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:44.426] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:44.513] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:44.718] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:44.917] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:45.117] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:45.308] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:45.508] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:45.698] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:45.868] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:46.074] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:46.232] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:46.444] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:46.655] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:46.837] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:47.028] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:47.227] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:47.417] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:47.586] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:47.642] [error] |||ERROR in task::run. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:47.662] [error] |||ERROR in graph::execute. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:47.663] [error] 573370665|1|1|In MergeAggregate kernel for MergeAggregate(group=[{}], EXPR$0=[$SUM0($0)], agg#1=[COUNT($0)]). What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:47.663] [error] |||ERROR in graph::execute. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported||||| [19:37:47.663] [error] |||ERROR in graph::execute. What: Ral failure at: /opt/conda/envs/rapids/conda-bld/blazingsql_1633567369093/work/engine/src/execution_kernels/BatchProcessing.cpp:392: ERROR: Projection::run() first input CacheData was nullptr||||| [19:37:47.670] [error] 573370665|||In get_execute_graph_results. What: cuDF failure at: ../src/io/csv/reader_impl.cu:212: byte_range offset with header not supported|||||

Steps/Code to reproduce bug

import os from pathlib import Path

Just needed on my computer.

os.environ["CONDA_PREFIX"] = '/data/astro/scratch/eriksen/miniconda3/envs/blazing'

import blazingsql from blazingsql import BlazingContext

d = Path('/data/astro/scratch/eriksen/kaggle/competitions/new-york-city-taxi-fare-prediction') bc = BlazingContext()

bc.create_table('train', str(d / 'train.csv')) gdf = bc.sql('select * from train limit 100')

Expected behavior

Print a number, the average taxi fare. If there being a problem with the input it should fail gracefully.

Environment overview (please complete the following information)

BlazingSQL version (git hash): 2a4a99cc83c4b8a52078cba2b8d6c80194cb3a78 BlazingSQL branch name: HEAD BlazingSQL branch tag: v21.10.00 BlazingSQL build id: 0 BlazingSQL compiler version: GNU /usr/local/gcc9/bin/g++ 9.4.0 BlazingSQL cuda flags: -Xcompiler -Wno-parentheses -gencode=arch=compute_60,code=sm_60 -gencode=arch=compute_70,code=sm_70 -gencode=arch=compute_75,code=sm_75 -gencode=arch=compute_75,code=compute_75 --expt-extended-lambda --expt-relaxed-constexpr -Werror=cross-execution-space-call -Xcompiler -Wall,-Wno-error=deprecated-declarations --default-stream=per-thread -DHT_DEFAULT_ALLOCATOR BlazingSQL Operating system kernel: Linux-5.8.0-1042-aws BlazingSQL Operating system architecture: x86_64 BlazingSQL Linux Operating system release: NAME=CentOS Linux|VERSION=7 (Core)|ID=centos|ID_LIKE=rhel fedora|VERSION_ID=7|PRETTY_NAME=CentOS Linux 7 (Core)|ANSI_COLOR=031|CPE_NAME=cpe:/o:centos:centos:7|HOME_URL=https://www.centos.org/|BUG_REPORT_URL=https://bugs.centos.org/||CENTOS_MANTISBT_PROJECT=CentOS-7|CENTOS_MANTISBT_PROJECT_VERSION=7|REDHAT_SUPPORT_PRODUCT=centos|REDHAT_SUPPORT_PRODUCT_VERSION=7| None

Environment details Please run and paste the output of the print_env.sh script here, to gather any other relevant environment details

Not sure where I am expect to find this. I include a list of packages in the conda environment: (blazing) [eriksen@gpu01 bin]$ conda list

packages in environment at /data/astro/scratch/eriksen/miniconda3/envs/blazing:

#

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge abseil-cpp 20210324.2 h9c3ff4c_0 conda-forge argon2-cffi 21.1.0 py38h497a2fe_2 conda-forge arrow-cpp 5.0.0 py38h327e1ba_4_cuda conda-forge arrow-cpp-proc 3.0.0 cuda conda-forge async_generator 1.10 py_0 conda-forge attrs 21.2.0 pyhd8ed1ab_0 conda-forge aws-c-cal 0.5.11 h95a6274_0 conda-forge aws-c-common 0.6.2 h7f98852_0 conda-forge aws-c-event-stream 0.2.7 h3541f99_13 conda-forge aws-c-io 0.10.5 hfb6a706_0 conda-forge aws-checksums 0.1.11 ha31a3da_7 conda-forge aws-sdk-cpp 1.8.186 hb4091e7_3 conda-forge backcall 0.2.0 pyh9f0ad1d_0 conda-forge backports 1.0 py_2 conda-forge backports.functools_lru_cache 1.6.4 pyhd8ed1ab_0 conda-forge blazingsql 21.10.0 pypi_0 pypi bleach 4.1.0 pyhd8ed1ab_0 conda-forge bokeh 2.4.2 py38h578d9bd_0 conda-forge boost-cpp 1.72.0 h359cf19_6 conda-forge brotlipy 0.7.0 py38h497a2fe_1003 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.18.1 h7f98852_0 conda-forge ca-certificates 2021.10.8 ha878542_0 conda-forge cachetools 4.2.4 pyhd8ed1ab_0 conda-forge certifi 2021.10.8 py38h578d9bd_1 conda-forge cffi 1.15.0 py38h3931269_0 conda-forge charset-normalizer 2.0.9 pyhd8ed1ab_0 conda-forge click 8.0.3 py38h578d9bd_1 conda-forge cloudpickle 2.0.0 pyhd8ed1ab_0 conda-forge colorama 0.4.4 pyh9f0ad1d_0 conda-forge cryptography 36.0.0 py38h3e25421_0 conda-forge cudatoolkit 11.4.2 h00f7ccd_9 conda-forge cudf 21.10.01 cuda_11.4_py38_ga1d2d13a14_0 rapidsai cupy 9.3.0 py38ha96c4f3_0 rapidsai cytoolz 0.11.2 py38h497a2fe_1 conda-forge dask 2021.9.1 pyhd8ed1ab_0 conda-forge dask-core 2021.9.1 pyhd8ed1ab_0 conda-forge dask-cuda 21.10.00 py38_0 rapidsai dask-cudf 21.10.01 py38_ga1d2d13a14_0 rapidsai debugpy 1.5.1 py38h709712a_0 conda-forge decorator 5.1.0 pyhd8ed1ab_0 conda-forge defusedxml 0.7.1 pyhd8ed1ab_0 conda-forge distributed 2021.9.1 py38h578d9bd_0 conda-forge dlpack 0.5 h9c3ff4c_0 conda-forge entrypoints 0.3 pyhd8ed1ab_1003 conda-forge fastavro 1.4.7 py38h497a2fe_1 conda-forge fastrlock 0.8 py38h709712a_1 conda-forge freetype 2.10.4 h0708190_1 conda-forge fsspec 2021.11.1 pyhd8ed1ab_0 conda-forge future 0.18.2 py38h578d9bd_4 conda-forge gflags 2.2.2 he1b5a44_1004 conda-forge glog 0.5.0 h48cff8f_0 conda-forge google-cloud-cpp 1.29.0 hb967e95_1 conda-forge greenlet 1.1.2 py38h709712a_1 conda-forge grpc-cpp 1.39.1 h850795e_1 conda-forge heapdict 1.0.1 py_0 conda-forge icu 69.1 h9c3ff4c_0 conda-forge idna 3.1 pyhd3deb0d_0 conda-forge importlib-metadata 4.8.2 py38h578d9bd_0 conda-forge importlib_resources 5.4.0 pyhd8ed1ab_0 conda-forge ipykernel 6.6.0 py38he5a9106_0 conda-forge ipython 7.30.1 py38h578d9bd_0 conda-forge ipython_genutils 0.2.0 py_1 conda-forge ipywidgets 7.6.5 pyhd8ed1ab_0 conda-forge jbig 2.1 h7f98852_2003 conda-forge jedi 0.18.1 py38h578d9bd_0 conda-forge jinja2 3.0.3 pyhd8ed1ab_0 conda-forge jpeg 9d h36c2ea0_0 conda-forge jpype1 1.3.0 py38h1fd1430_2 conda-forge jsonschema 4.2.1 pyhd8ed1ab_1 conda-forge jupyter_client 7.1.0 pyhd8ed1ab_0 conda-forge jupyter_core 4.9.1 py38h578d9bd_1 conda-forge jupyterlab_pygments 0.1.2 pyh9f0ad1d_0 conda-forge jupyterlab_widgets 1.0.2 pyhd8ed1ab_0 conda-forge krb5 1.19.2 hcc1bbae_3 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.36.1 hea4e1c9_2 conda-forge lerc 3.0 h9c3ff4c_0 conda-forge libblas 3.9.0 12_linux64_openblas conda-forge libbrotlicommon 1.0.9 h7f98852_6 conda-forge libbrotlidec 1.0.9 h7f98852_6 conda-forge libbrotlienc 1.0.9 h7f98852_6 conda-forge libcblas 3.9.0 12_linux64_openblas conda-forge libcrc32c 1.1.2 h9c3ff4c_0 conda-forge libcudf 21.10.01 cuda11.4_ga1d2d13a14_0 rapidsai libcurl 7.80.0 h2574ce0_0 conda-forge libdeflate 1.8 h7f98852_0 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libevent 2.1.10 h9b69904_4 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 11.2.0 h1d223b6_11 conda-forge libgfortran-ng 11.2.0 h69a702a_11 conda-forge libgfortran5 11.2.0 h5c6108e_11 conda-forge libgomp 11.2.0 h1d223b6_11 conda-forge libhwloc 2.3.0 h5e5b7d1_1 conda-forge libiconv 1.16 h516909a_0 conda-forge liblapack 3.9.0 12_linux64_openblas conda-forge libllvm10 10.0.1 he513fc3_3 conda-forge libnghttp2 1.43.0 h812cca2_1 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libopenblas 0.3.18 pthreads_h8fe5266_0 conda-forge libpng 1.6.37 h21135ba_2 conda-forge libpq 13.5 hd57d9b9_1 conda-forge libprotobuf 3.16.0 h780b84a_0 conda-forge librmm 21.10.01 cuda11.4_gc54767f_0 rapidsai libsodium 1.0.18 h36c2ea0_1 conda-forge libssh2 1.10.0 ha56f1ee_2 conda-forge libstdcxx-ng 11.2.0 he4da1e4_11 conda-forge libthrift 0.14.2 he6d91bd_1 conda-forge libtiff 4.3.0 h6f004c6_2 conda-forge libutf8proc 2.6.1 h7f98852_0 conda-forge libwebp-base 1.2.1 h7f98852_0 conda-forge libxml2 2.9.12 h885dcf4_1 conda-forge libzlib 1.2.11 h36c2ea0_1013 conda-forge llvmlite 0.36.0 py38h4630a5e_0 conda-forge locket 0.2.0 py_2 conda-forge lz4-c 1.9.3 h9c3ff4c_1 conda-forge markupsafe 2.0.1 py38h497a2fe_1 conda-forge matplotlib-inline 0.1.3 pyhd8ed1ab_0 conda-forge mistune 0.8.4 py38h497a2fe_1005 conda-forge msgpack-python 1.0.3 py38h1fd1430_0 conda-forge nbclient 0.5.9 pyhd8ed1ab_0 conda-forge nbconvert 6.3.0 py38h578d9bd_1 conda-forge nbformat 5.1.3 pyhd8ed1ab_0 conda-forge ncurses 6.2 h58526e2_4 conda-forge nest-asyncio 1.5.4 pyhd8ed1ab_0 conda-forge netifaces 0.10.9 py38h497a2fe_1004 conda-forge nlohmann_json 3.9.1 h9c3ff4c_1 conda-forge notebook 6.4.6 pyha770c72_0 conda-forge numba 0.53.1 py38h8b71fd7_1 conda-forge numpy 1.21.4 py38he2449b9_0 conda-forge nvtx 0.2.3 py38h497a2fe_1 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openjdk 8.0.312 h7f98852_0 conda-forge openjpeg 2.4.0 hb52868f_1 conda-forge openssl 1.1.1l h7f98852_0 conda-forge orc 1.6.10 h58a87f1_0 conda-forge packaging 21.3 pyhd8ed1ab_0 conda-forge pandas 1.3.4 py38h43a58ef_1 conda-forge pandoc 2.16.2 h7f98852_0 conda-forge pandocfilters 1.5.0 pyhd8ed1ab_0 conda-forge parquet-cpp 1.5.1 2 conda-forge parso 0.8.3 pyhd8ed1ab_0 conda-forge partd 1.2.0 pyhd8ed1ab_0 conda-forge pexpect 4.8.0 pyh9f0ad1d_2 conda-forge pickleshare 0.7.5 py_1003 conda-forge pillow 8.4.0 py38h8e6f84c_0 conda-forge pip 21.3.1 pyhd8ed1ab_0 conda-forge prometheus_client 0.12.0 pyhd8ed1ab_0 conda-forge prompt-toolkit 3.0.24 pyha770c72_0 conda-forge protobuf 3.16.0 py38h709712a_0 conda-forge psutil 5.8.0 py38h497a2fe_2 conda-forge ptyprocess 0.7.0 pyhd3deb0d_0 conda-forge pure-sasl 0.6.2 pyhd8ed1ab_0 conda-forge pyarrow 5.0.0 py38hed47224_4_cuda conda-forge pycparser 2.21 pyhd8ed1ab_0 conda-forge pygments 2.10.0 pyhd8ed1ab_0 conda-forge pyhive 0.6.4 pyhd8ed1ab_0 conda-forge pynvml 11.4.1 pyhd8ed1ab_0 conda-forge pyopenssl 21.0.0 pyhd8ed1ab_0 conda-forge pyparsing 3.0.6 pyhd8ed1ab_0 conda-forge pyrsistent 0.18.0 py38h497a2fe_0 conda-forge pysocks 1.7.1 py38h578d9bd_4 conda-forge python 3.8.12 hb7a2778_2_cpython conda-forge python-dateutil 2.8.2 pyhd8ed1ab_0 conda-forge python_abi 3.8 2_cp38 conda-forge pytz 2021.3 pyhd8ed1ab_0 conda-forge pyyaml 6.0 py38h497a2fe_3 conda-forge pyzmq 22.3.0 py38h2035c66_1 conda-forge re2 2021.09.01 h9c3ff4c_0 conda-forge readline 8.1 h46c0cb4_0 conda-forge requests 2.26.0 pyhd8ed1ab_1 conda-forge rmm 21.10.01 cuda_11.4_py38_gc54767f_0 rapidsai s2n 1.0.10 h9b69904_0 conda-forge send2trash 1.8.0 pyhd8ed1ab_0 conda-forge setuptools 59.4.0 py38h578d9bd_0 conda-forge six 1.16.0 pyh6c4a22f_0 conda-forge snappy 1.1.8 he1b5a44_3 conda-forge sortedcontainers 2.4.0 pyhd8ed1ab_0 conda-forge spdlog 1.8.5 h4bd325d_0 conda-forge sqlalchemy 1.4.28 py38h497a2fe_0 conda-forge sqlite 3.37.0 h9cd32fc_0 conda-forge tblib 1.7.0 pyhd8ed1ab_0 conda-forge terminado 0.12.1 py38h578d9bd_1 conda-forge testpath 0.5.0 pyhd8ed1ab_0 conda-forge thrift 0.15.0 py38h709712a_1 conda-forge thrift_sasl 0.4.3 pyhd8ed1ab_1 conda-forge tk 8.6.11 h27826a3_1 conda-forge toolz 0.11.2 pyhd8ed1ab_0 conda-forge tornado 6.1 py38h497a2fe_2 conda-forge tqdm 4.62.3 pyhd8ed1ab_0 conda-forge traitlets 5.1.1 pyhd8ed1ab_0 conda-forge typing_extensions 4.0.1 pyha770c72_0 conda-forge ucx 1.11.2+gef2bbcf cuda11.2_0 rapidsai ucx-proc 1.0.0 gpu rapidsai ucx-py 0.22.01 py38_gef2bbcf_33 rapidsai urllib3 1.26.7 pyhd8ed1ab_0 conda-forge wcwidth 0.2.5 pyh9f0ad1d_2 conda-forge webencodings 0.5.1 py_1 conda-forge wheel 0.37.0 pyhd8ed1ab_1 conda-forge widgetsnbextension 3.5.2 py38h578d9bd_1 conda-forge xz 5.2.5 h516909a_1 conda-forge yaml 0.2.5 h516909a_0 conda-forge zeromq 4.3.4 h9c3ff4c_1 conda-forge zict 2.0.0 py_0 conda-forge zipp 3.6.0 pyhd8ed1ab_0 conda-forge zlib 1.2.11 h36c2ea0_1013 conda-forge zstd 1.5.0 ha95c52a_0 conda-forge

Additional context Adding some information on the GPU, since this error is happening in a CUDA file.

(blazing) [eriksen@gpu01 bin]$ nvidia-smi Fri Dec 10 19:43:08 2021
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 450.51.05 Driver Version: 450.51.05 CUDA Version: 11.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce RTX 208... Off | 00000000:3D:00.0 Off | N/A | | 31% 32C P8 19W / 250W | 3MiB / 11019MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

----For BlazingSQL Developers---- Suspected source of the issue Where and what are potential sources of the issue

Other design considerations What components of the engine could be affected by this?