dask-contrib / dask-sql

Distributed SQL Engine in Python using Dask
https://dask-sql.readthedocs.io/
MIT License
386 stars 71 forks source link

[BUG] [Crash Bug] SELECT * FROM <table> WHERE <condition> brings Crash #1342

Open qwebug opened 3 months ago

qwebug commented 3 months ago

What happened:

SELECT * FROM \

WHERE \ brings crash, when using CPU execution.

What you expected to happen:

It will not bring crash, when using CPU execution.

Minimal Complete Verifiable Example:

Query by JDBC:

CREATE TABLE t0_testtable WITH ( location = '/tmp/t0.csv', format = 'csv', persist = True, gpu = FALSE );
CREATE TABLE t0_testtable_gpu WITH ( location = '/tmp/t0.csv', format = 'csv', persist = True, gpu = TRUE );
SELECT * FROM t0 WHERE (NOT t0.c0);

t0.csv:

c0,
'false',

SQL:

SELECT * FROM t0 WHERE (NOT t0.c0);

Result:

Query is gone (server restarted?)

Due to a crash caused by this query, the Dask-sql server restarted.

SQL:

SELECT * FROM t0_gpu WHERE (NOT t0_gpu.c0);

Result:

 c0 | Unnamed: 1 
----+------------
(0 rows)

Query by python:

import dask.dataframe as dd
from dask_sql import Context
import pandas as pd

c = Context()

df0 = pd.DataFrame({
    'c0':['false'],
})
t0 = dd.from_pandas(df0, npartitions=1)

c.create_table('t0', t0, persist = True, gpu=False)
c.create_table('t0_gpu', t0, persist = True, gpu=True)

print('GPU Result:')
result2= c.sql("SELECT * FROM t0_gpu WHERE (NOT t0_gpu.c0)").compute()
print(result2)

print('CPU Result:')
result1= c.sql("SELECT * FROM t0 WHERE (NOT t0.c0)").compute()
print(result1)

Result:

INFO:numba.cuda.cudadrv.driver:init
GPU Result:
Empty DataFrame
Columns: [c0]
Index: []
CPU Result:
Traceback (most recent call last):
  File "/tmp/test.py", line 19, in <module>
    result1= c.sql("SELECT * FROM t0 WHERE (NOT t0.c0)").compute()
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/base.py", line 314, in compute
    (result,) = compute(self, traverse=False, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/base.py", line 599, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/threaded.py", line 89, in get
    results = get_async(
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/local.py", line 511, in get_async
    raise_exception(exc, tb)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/local.py", line 319, in reraise
    raise exc
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/local.py", line 224, in execute_task
    result = _execute_task(task, data)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/optimization.py", line 990, in __call__
    return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 149, in get
    result = _execute_task(task, cache)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 113, in _execute_task
    return [_execute_task(a, cache) for a in arg]
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 113, in <listcomp>
    return [_execute_task(a, cache) for a in arg]
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in <genexpr>
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/core.py", line 119, in _execute_task
    return func(*(_execute_task(a, cache) for a in args))
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/utils.py", line 73, in apply
    return func(*args, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/dask/utils.py", line 1105, in __call__
    return getattr(__obj, self.method)(*args, **kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/pandas/core/generic.py", line 6240, in astype
    new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 448, in astype
    return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 352, in apply
    applied = getattr(b, f)(**kwargs)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 526, in astype
    new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 299, in astype_array_safe
    new_values = astype_array(values, dtype, copy=copy)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 230, in astype_array
    values = astype_nansafe(values, dtype, copy=copy)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/pandas/core/dtypes/astype.py", line 95, in astype_nansafe
    return dtype.construct_array_type()._from_sequence(arr, dtype=dtype, copy=copy)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/pandas/core/arrays/masked.py", line 132, in _from_sequence
    values, mask = cls._coerce_to_array(scalars, dtype=dtype, copy=copy)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/pandas/core/arrays/boolean.py", line 344, in _coerce_to_array
    return coerce_to_array(value, copy=copy)
  File "/opt/conda/envs/rapids/lib/python3.10/site-packages/pandas/core/arrays/boolean.py", line 194, in coerce_to_array
    raise TypeError("Need to pass bool-like values")
TypeError: Need to pass bool-like values

Anything else we need to know?:

Environment: