BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.93k stars 183 forks source link

[BUG] BlazingSQL environment works in python and ipython but does not in Jupyter #1329

Closed lucharo closed 3 years ago

lucharo commented 3 years ago

Describe the bug I have succesfully installed and tested a conda blazingSQL environment using python and ipython. I have also succesfully created a jupyter kernel but the blazingSQL loaded in the jupyter notebook seems to work differently and throws an error.

Steps/Code to reproduce bug 1 - Install Blazing

$ mamba create -n iblazing \
-c rapidsai \
-c nvidia \
-c conda-forge \
-c defaults \
blazingsql python=3.7 cudatoolkit=10.1 -y 

2 - Install ipykernel with pip (conda install ipykernel crashes)

$ conda activate iblazing
(iblazing) $ pip install ipykernel 

3 - Create kernel

(iblazing) $ python -m ipykernel install --user --name iblazing

4 - Open Jupyter notebook in JupyterLab with created kernel 5 - Import blazingsql + get error

from blazingsql import BlazingContext

Observed behaviour

Run:

!source arevars.sh # JAVA_HOME, CLASSPATH, ARROW_LIBHDFS, etc
from blazingsql import BlazingContext

Output:

---------------------------------------------------------------------------
Exception                                 Traceback (most recent call last)
/projects/gds/chavesrl/condapv/envs/iblazing/lib/python3.7/site-packages/_jpype.cpython-37m-x86_64-linux-gnu.so in org.jpype.classloader.DynamicClassLoader.addFile()

Exception: Java Exception

The above exception was the direct cause of the following exception:

java.io.FileNotFoundException             Traceback (most recent call last)
<ipython-input-3-0b19b5b41f48> in <module>
----> 1 from blazingsql import BlazingContext

/projects/gds/chavesrl/condapv/envs/iblazing/lib/python3.7/site-packages/blazingsql/__init__.py in <module>
      1 from pyblazing.apiv2 import S3EncryptionType
      2 from pyblazing.apiv2 import DataType
----> 3 from pyblazing.apiv2.context import BlazingContext
      4 
      5 from cio import getProductDetailsCaller

/projects/gds/chavesrl/condapv/envs/iblazing/lib/python3.7/site-packages/pyblazing/apiv2/context.py in <module>
     55 
     56 jpype.addClassPath(
---> 57     os.path.join(os.getenv("CONDA_PREFIX"), "lib/blazingsql-algebra.jar")
     58 )
     59 jpype.addClassPath(

/projects/gds/chavesrl/condapv/envs/iblazing/lib/python3.7/site-packages/jpype/_classpath.py in addClassPath(path1)
     63                 classLoader.addFile(Paths.get(str(path)))
     64         else:
---> 65             classLoader.addFile(Paths.get(str(path1)))
     66     _CLASSPATHS.append(path1)
     67 

java.io.FileNotFoundException: java.io.FileNotFoundException: /opt/conda/lib/blazingsql-algebra.jar

Expected behavior I would expect BlazingSQL to load equally in python , ipython and Jupyter sessions.

Environment overview (please complete the following information)

Environment details Please run and paste the output of the print_env.sh script here, to gather any other relevant environment details

conda list output below:

Expand/Collapse: ```bash # Name Version Build Channel _libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 1_gnu conda-forge abseil-cpp 20200225.2 he1b5a44_2 conda-forge alsa-lib 1.2.3 h516909a_0 conda-forge arrow-cpp 1.0.1 py37h2318771_14_cuda conda-forge arrow-cpp-proc 3.0.0 cuda conda-forge aws-c-common 0.4.59 h36c2ea0_1 conda-forge aws-c-event-stream 0.1.6 had2084c_6 conda-forge aws-checksums 0.1.10 h4e93380_0 conda-forge aws-sdk-cpp 1.8.63 h9b98462_0 conda-forge backcall 0.2.0 pypi_0 pypi blazingsql 0.17.0 pypi_0 pypi bokeh 2.2.3 py37hc8dfbb8_0 conda-forge boost-cpp 1.72.0 h9d3c048_4 conda-forge brotli 1.0.9 h9c3ff4c_4 conda-forge brotlipy 0.7.0 py37hb5d75c8_1001 conda-forge bzip2 1.0.8 h7f98852_4 conda-forge c-ares 1.17.1 h36c2ea0_0 conda-forge ca-certificates 2021.1.19 h06a4308_0 anaconda-main-remote cairo 1.16.0 h7979940_1007 conda-forge certifi 2020.12.5 py37h89c1867_1 conda-forge cffi 1.14.4 py37hc58025e_1 conda-forge chardet 4.0.0 py37h89c1867_1 conda-forge click 7.1.2 pyh9f0ad1d_0 conda-forge cloudpickle 1.6.0 py_0 conda-forge conda 4.9.2 py37h89c1867_0 conda-forge conda-package-handling 1.7.2 py37hb5d75c8_0 conda-forge cryptography 3.3.1 py37h7f0c10b_1 conda-forge cudatoolkit 10.1.243 h6bb024c_0 nvidia-remote cudf 0.17.0 cuda_10.1_py37_gf56ef850e6_0 rapidsai-remote cudnn 7.6.5.32 hc0a50b0_1 conda-forge cupy 8.4.0 py37hb9ab7da_1 conda-forge cutensor 1.2.2.5 h8b44402_2 conda-forge cyrus-sasl 2.1.27 h3274739_1 conda-forge cytoolz 0.11.0 py37h5e8e339_3 conda-forge dask 2021.2.0 pyhd8ed1ab_0 conda-forge dask-core 2021.2.0 pyhd8ed1ab_0 conda-forge dask-cuda 0.17.0 py37_0 rapidsai-remote dask-cudf 0.17.0 py37_gf56ef850e6_0 rapidsai-remote decorator 4.4.2 pypi_0 pypi distributed 2021.2.0 py37h89c1867_0 conda-forge dlpack 0.3 he1b5a44_1 conda-forge fastavro 1.3.1 py37h5e8e339_0 conda-forge fastrlock 0.5 py37hcd2ae1e_2 conda-forge fontconfig 2.13.1 hba837de_1004 conda-forge freetype 2.10.4 h0708190_1 conda-forge fsspec 0.8.5 pyhd8ed1ab_0 conda-forge future 0.18.2 py37h89c1867_3 conda-forge gettext 0.19.8.1 h0b5b191_1005 conda-forge gflags 2.2.2 he1b5a44_1004 conda-forge giflib 5.2.1 h516909a_2 conda-forge glog 0.4.0 h49b9bf7_3 conda-forge google-cloud-cpp 1.16.0 he4a878c_2 conda-forge google-cloud-cpp-common 0.25.0 he83eced_7 conda-forge googleapis-cpp 0.10.0 h6b1abdc_4 conda-forge graphite2 1.3.14 h23475e2_0 anaconda-main-remote grpc-cpp 1.32.0 h7997a97_1 conda-forge gtest 1.10.0 h4bd325d_7 conda-forge harfbuzz 2.7.4 h5cf4720_0 conda-forge heapdict 1.0.1 py_0 conda-forge icu 68.1 h58526e2_0 conda-forge idna 2.10 pyh9f0ad1d_0 conda-forge ipykernel 5.4.3 pypi_0 pypi ipython 7.20.0 pypi_0 pypi ipython-genutils 0.2.0 pypi_0 pypi jedi 0.18.0 pypi_0 pypi jinja2 2.11.3 pyh44b312d_0 conda-forge jpeg 9d h516909a_0 conda-forge jpype1 1.2.1 py37h2527ec5_0 conda-forge jupyter-client 6.1.11 pypi_0 pypi jupyter-core 4.7.1 pypi_0 pypi krb5 1.17.2 h926e7f8_0 conda-forge lcms2 2.12 hddcbb42_0 conda-forge ld_impl_linux-64 2.35.1 hea4e1c9_2 conda-forge libarchive 3.5.1 h899b81a_0 conda-forge libblas 3.9.0 8_openblas conda-forge libcblas 3.9.0 8_openblas conda-forge libcrc32c 1.1.1 he1b5a44_2 conda-forge libcudf 0.17.0 cuda10.1_gf56ef850e6_0 rapidsai-remote libcurl 7.71.1 hcdd3856_8 conda-forge libedit 3.1.20191231 he28a2e2_2 conda-forge libev 4.33 h516909a_1 conda-forge libevent 2.1.10 hcdb4288_3 conda-forge libffi 3.3 h58526e2_2 conda-forge libgcc-ng 9.3.0 h2828fa1_18 conda-forge libgfortran-ng 7.5.0 h14aa051_18 conda-forge libgfortran4 7.5.0 h14aa051_18 conda-forge libglib 2.66.6 h1f3bc88_3 conda-forge libgomp 9.3.0 h2828fa1_18 conda-forge libhwloc 2.3.0 h5e5b7d1_1 conda-forge libiconv 1.16 h516909a_0 conda-forge liblapack 3.9.0 8_openblas conda-forge libllvm10 10.0.1 he513fc3_3 conda-forge libnghttp2 1.43.0 h812cca2_0 conda-forge libntlm 1.5 h7b6447c_0 anaconda-main-remote libopenblas 0.3.12 pthreads_hb3c22a3_1 conda-forge libpng 1.6.37 hed695b0_2 conda-forge libprotobuf 3.13.0.1 h8b12597_0 conda-forge librmm 0.17.0 cuda10.1_gc4cc945_0 rapidsai-remote libsodium 1.0.18 h516909a_1 conda-forge libsolv 0.7.17 h780b84a_0 conda-forge libssh2 1.9.0 hab1572f_5 conda-forge libstdcxx-ng 9.3.0 h6de172a_18 conda-forge libthrift 0.13.0 hbe8ec66_6 conda-forge libtiff 4.2.0 hdc55705_0 conda-forge libutf8proc 2.6.1 h7f98852_0 conda-forge libuuid 2.32.1 h14c3975_1000 conda-forge libwebp-base 1.2.0 h7f98852_0 conda-forge libxcb 1.14 h7b6447c_0 anaconda-main-remote libxml2 2.9.10 h72842e0_3 conda-forge llvmlite 0.35.0 py37h9d7f4d0_1 conda-forge locket 0.2.1 py37h06a4308_1 anaconda-main-remote lz4-c 1.9.2 he1b5a44_3 conda-forge lzo 2.10 h516909a_1000 conda-forge mamba 0.7.12 py37h7f483ca_0 conda-forge markupsafe 1.1.1 py37h5e8e339_3 conda-forge msgpack-python 1.0.2 py37h2527ec5_1 conda-forge nccl 2.8.4.1 h8b44402_0 conda-forge ncurses 6.2 h58526e2_4 conda-forge netifaces 0.10.9 py37h8f50634_1003 conda-forge numba 0.52.0 py37hdc94413_0 conda-forge numpy 1.19.5 py37haa41c4c_1 conda-forge nvtx 0.2.3 py37h5e8e339_0 conda-forge olefile 0.46 pyh9f0ad1d_1 conda-forge openjdk 11.0.8 hacce0ff_0 conda-forge openssl 1.1.1i h7f98852_0 conda-forge orc 1.6.5 hd3605a7_0 conda-forge packaging 20.9 pyh44b312d_0 conda-forge pandas 1.1.5 py37hdc94413_0 conda-forge parquet-cpp 1.5.1 1 conda-forge parso 0.8.1 pypi_0 pypi partd 1.1.0 py_0 conda-forge pcre 8.44 he1b5a44_0 conda-forge pexpect 4.8.0 pypi_0 pypi pickleshare 0.7.5 pypi_0 pypi pillow 8.1.0 py37h4600e1f_2 conda-forge pip 21.0.1 pyhd8ed1ab_0 conda-forge pixman 0.40.0 h36c2ea0_0 conda-forge prompt-toolkit 3.0.16 pypi_0 pypi protobuf 3.13.0.1 py37h3340039_1 conda-forge psutil 5.8.0 py37h5e8e339_1 conda-forge ptyprocess 0.7.0 pypi_0 pypi pyarrow 1.0.1 py37hbeecfa9_14_cuda conda-forge pycosat 0.6.3 py37h5e8e339_1006 conda-forge pycparser 2.20 pyh9f0ad1d_2 conda-forge pygments 2.7.4 pypi_0 pypi pyhive 0.6.3 pyhd3deb0d_0 conda-forge pynvml 8.0.4 py_1 conda-forge pyopenssl 20.0.1 pyhd8ed1ab_0 conda-forge pyparsing 2.4.7 pyh9f0ad1d_0 conda-forge pysocks 1.7.1 py37h89c1867_3 conda-forge python 3.7.9 hffdb5ce_0_cpython conda-forge python-dateutil 2.8.1 py_0 conda-forge python_abi 3.7 1_cp37m conda-forge pytz 2021.1 pyhd8ed1ab_0 conda-forge pyyaml 5.4.1 py37h5e8e339_0 conda-forge pyzmq 22.0.3 pypi_0 pypi re2 2020.10.01 he1b5a44_0 conda-forge readline 8.1 h27cfd23_0 anaconda-main-remote reproc 14.2.1 h36c2ea0_0 conda-forge reproc-cpp 14.2.1 h58526e2_0 conda-forge requests 2.25.1 pyhd3deb0d_0 conda-forge rmm 0.17.0 cuda_10.1_py37_gc4cc945_0 rapidsai-remote ruamel_yaml 0.15.87 py37h7b6447c_1 anaconda-main-remote sasl 0.2.1 py37h3340039_1002 conda-forge setuptools 52.0.0 py37h06a4308_0 anaconda-main-remote six 1.15.0 pyh9f0ad1d_0 conda-forge snappy 1.1.8 he1b5a44_3 conda-forge sortedcontainers 2.3.0 pyhd8ed1ab_0 conda-forge spdlog 1.7.0 hc9558a2_2 conda-forge sqlalchemy 1.3.23 py37h5e8e339_0 conda-forge sqlite 3.34.0 h74cdb3f_0 conda-forge tblib 1.7.0 py_0 anaconda-main-remote thrift 0.13.0 py37h3340039_2 conda-forge thrift_sasl 0.4.2 py37h8f50634_0 conda-forge tk 8.6.10 hed695b0_1 conda-forge toolz 0.11.1 py_0 conda-forge tornado 6.1 py37h5e8e339_1 conda-forge tqdm 4.56.1 pyhd8ed1ab_0 conda-forge traitlets 5.0.5 pypi_0 pypi typing_extensions 3.7.4.3 py_0 conda-forge ucx 1.8.1+g6b29558 cuda10.1_0 rapidsai-remote ucx-proc 1.0.0 gpu rapidsai-remote ucx-py 0.17.0 py37_g6b29558_0 rapidsai-remote urllib3 1.26.3 pyhd8ed1ab_0 conda-forge wcwidth 0.2.5 pypi_0 pypi wheel 0.36.2 pyhd3deb0d_0 conda-forge xorg-fixesproto 5.0 h14c3975_1002 conda-forge xorg-inputproto 2.3.2 h14c3975_1002 conda-forge xorg-kbproto 1.0.7 h14c3975_1002 conda-forge xorg-libice 1.0.10 h516909a_0 conda-forge xorg-libsm 1.2.3 h84519dc_1000 conda-forge xorg-libx11 1.6.12 h516909a_0 conda-forge xorg-libxext 1.3.4 h516909a_0 conda-forge xorg-libxfixes 5.0.3 h516909a_1004 conda-forge xorg-libxi 1.7.10 h516909a_0 conda-forge xorg-libxrender 0.9.10 h516909a_1002 conda-forge xorg-libxtst 1.2.3 h516909a_1002 conda-forge xorg-recordproto 1.14.2 h516909a_1002 conda-forge xorg-renderproto 0.11.1 h14c3975_1002 conda-forge xorg-xextproto 7.3.0 h14c3975_1002 conda-forge xorg-xproto 7.0.31 h14c3975_1007 conda-forge xz 5.2.5 h516909a_1 conda-forge yaml 0.2.5 h516909a_0 conda-forge zeromq 4.3.3 h58526e2_3 conda-forge zict 2.0.0 py_0 conda-forge zlib 1.2.11 h516909a_1010 conda-forge zstd 1.4.8 hdf46e1d_0 conda-forge ```

Additional context Add any other context about the problem here.

----For BlazingSQL Developers---- Suspected source of the issue Where and what are potential sources of the issue

Other design considerations What components of the engine could be affected by this?

aucahuasi commented 3 years ago

Hi @lc5415 thanks for reporting the issue!

I get this error when I run the first command (1 - Install Blazing)

# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<

    Traceback (most recent call last):
      File "/opt/conda/envs/j1/lib/python3.8/site-packages/conda/exceptions.py", line 1079, in __call__
        return func(*args, **kwargs)
      File "/opt/conda/envs/j1/lib/python3.8/site-packages/mamba/mamba.py", line 882, in exception_converter
        raise e
      File "/opt/conda/envs/j1/lib/python3.8/site-packages/mamba/mamba.py", line 876, in exception_converter
        exit_code = _wrapped_main(*args, **kwargs)
      File "/opt/conda/envs/j1/lib/python3.8/site-packages/mamba/mamba.py", line 835, in _wrapped_main
        result = do_call(args, p)
      File "/opt/conda/envs/j1/lib/python3.8/site-packages/mamba/mamba.py", line 720, in do_call
        exit_code = create(args, parser)
      File "/opt/conda/envs/j1/lib/python3.8/site-packages/mamba/mamba.py", line 624, in create
        return install(args, parser, "create")
      File "/opt/conda/envs/j1/lib/python3.8/site-packages/mamba/mamba.py", line 514, in install
        index = load_channels(pool, channels, repos)
      File "/opt/conda/envs/j1/lib/python3.8/site-packages/mamba/utils.py", line 93, in load_channels
        index = get_index(
      File "/opt/conda/envs/j1/lib/python3.8/site-packages/mamba/utils.py", line 74, in get_index
        is_downloaded = dlist.download(True)
    RuntimeError: Download error (6) Couldn't resolve host name [https://artifactory.trusted.visa.com/rapidsai-remote/noarch/repodata.json]

`$ /opt/conda/envs/j1/bin/mamba create -n iblazing -c https://artifactory.trusted.visa.com/rapidsai-remote -c https://artifactory.trusted.visa.com/nvidia-remote -c https://artifactory.trusted.visa.com/conda-forge -c https://artifactory.trusted.visa.com/anaconda-main-remote blazingsql python=3.7 cudatoolkit=10.1`

Have you tried to install blazingsql using these steps?

lucharo commented 3 years ago

@aucahuasi My bad, I put in my enterprise channels. I've edited the issue with the open source channels now

lucharo commented 3 years ago

@aucahuasi Also, I actually figured out the issue. I was running !source arevars.ah which sets the environment variables as per the documentation. The problem is that, the end variables weren't actually being set in the same shell than the Jupyter notebook. Hence I had to import os and define the environment variables from within the jupyter notevook session (e.g os.environ['CLASSPATH']=...)

Moreover there was an issue with my CONDA_PREFIX env variable, I had to reset it to point at the right conda environment. I think this latter problem has to do with my specific Jupyter lab instance rather than with the blazingSQL code.