BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.93k stars 183 forks source link

[BUG] Querying a test parquet file fails #357

Closed randerzander closed 4 years ago

randerzander commented 4 years ago

Environment: latest 0.13 nightly conda packages

(rapids) root@dgx01:/# conda list | grep rapids
# packages in environment at /conda/envs/rapids:
bsql-rapids-thirdparty    0.13.0a                       0    blazingsql-nightly
cudf                      0.13.0a200204         py37_1242    rapidsai-nightly
cugraph                   0.13.0a200204           py37_95    rapidsai-nightly
cuml                      0.13.0a200203   cuda10.0_py37_103    rapidsai-nightly
cuspatial                 0.13.0a200204            py37_7    rapidsai-nightly
dask-cuda                 0.13.0a200204           py37_40    rapidsai-nightly
dask-cudf                 0.13.0a200204         py37_1242    rapidsai-nightly
dask-xgboost              0.2.0.dev28      cuda10.0py36_0    rapidsai-nightly
libcudf                   0.13.0a200204     cuda10.0_1242    rapidsai-nightly
libcugraph                0.13.0a200204       cuda10.0_95    rapidsai-nightly
libcuml                   0.13.0a200203      cuda10.0_103    rapidsai-nightly
libcumlprims              0.13.0a200204        cuda10.0_9    rapidsai-nightly
libcuspatial              0.13.0a200204        cuda10.0_7    rapidsai-nightly
libnvstrings              0.13.0a200204     cuda10.0_1242    rapidsai-nightly
librmm                    0.13.0a200204      cuda10.0_143    rapidsai-nightly
libxgboost                1.0.0dev.rapidsai0.12      cuda10.0_1    rapidsai-nightly
nvstrings                 0.13.0a200204         py37_1242    rapidsai-nightly
py-xgboost                1.0.0dev.rapidsai0.12  cuda10.0py37_1    rapidsai-nightly
rapids                    0.13.0          cuda10.0_py37_1    rapidsai-nightly
rapids-xgboost            0.13.0          cuda10.0_py37_1    rapidsai-nightly
rmm                       0.13.0a200204          py37_143    rapidsai-nightly
ucx                       1.7.0dev+g9d06c3a    cuda10.0_129    rapidsai-nightly
ucx-py                    0.13.0a200123+g896e60b         py37_12    rapidsai-nightly
xgboost                   1.0.0dev.rapidsai0.12  cuda10.0py37_1    rapidsai-nightly
(rapids) root@dgx01:/# conda list | grep blazing
blazingsql                0.13.0a         cuda10.0_py37_16    blazingsql-nightly/label/cuda10.0
bsql-rapids-thirdparty    0.13.0a                       0    blazingsql-nightly
bsql-toolchain            0.13.0a                       0    blazingsql-nightly
bsql-toolchain-aws-cpp    0.13.0a                       0    blazingsql-nightly
bsql-toolchain-gcp-cpp    0.13.0a                       0    blazingsql-nightly

Following the LocalCUDACluster multi-GPU setup from the docs

from blazingsql import BlazingContext
from dask_cuda import LocalCUDACluster
from dask.distributed import Client
import pandas as pd

cluster = LocalCUDACluster()
client = Client(cluster)
bc = BlazingContext(dask_client = client, network_interface = 'lo')

# create a test file
df = pd.DataFrame()
df['id'] = [0, 1, 2, 2, 3]
df['val'] = [0, 1, 2, 2, 3]
df.to_parquet('test.parquet')

bc.create_table('test', 'test.parquet')

ddf = bc.sql('select * from test limit 10')
---------------------------------------------------------------------------
UnboundLocalError                         Traceback (most recent call last)
<ipython-input-15-039e1e517fba> in <module>
      5 
      6 # query table
----> 7 ddf = bc.sql(query)
      8 ddf.head()

/conda/envs/rapids/lib/python3.7/site-packages/pyblazing/apiv2/context.py in sql(self, sql, table_list, algebra)
    713             j = 0
    714             for nodeList in nodeTableList:
--> 715                 nodeList[table] = currentTableNodes[j]
    716                 j = j + 1
    717             scan_table_query = relational_algebra_steps[table]['table_scans'][0]

UnboundLocalError: local variable 'currentTableNodes' referenced before assignment
randerzander commented 4 years ago

Fixed by specifying the full path to the file instead of assuming the test file could be found in the current working directory.