BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.92k stars 181 forks source link

Unsigned Types support #1451

Closed felipeblazing closed 3 years ago

felipeblazing commented 3 years ago

Currently we do not support unsigned integer types in blazingsql. Now that they are supported in cudf we can easily add support for unsigned types in terms of compute but there are many places in the engine where we are iterating through the types.

Calcite expression Parsing

Interops The only real issue here is we might have to treat unsigned 64bit integers as special for certain operators because of how data is stored during execution. This can be handled pretty easily during the read phase of interops.

Metadata Common

Expression Utils

These are the main places where this needs to be implemented in order for it to work.

drabastomek commented 3 years ago

@felipeblazing I do see these supported but I'm not sure if this is exactly what you meant.

import cudf
from blazingsql import BlazingContext

bc = BlazingContext()

dtypes_vals = [
    (np.int8(np.arange(-5, 6)), 'np_int8')
    , (np.int16(np.arange(-5, 6)), 'np_int16')
    , (np.int32(np.arange(-5, 6)), 'np_int32')
    , (np.int64(np.arange(-5, 6)), 'np_int64')
    , (np.uint8(np.arange(0, 11)), 'np_uint8')
    , (np.uint16(np.arange(0, 11)), 'np_uint16')
    , (np.uint32(np.arange(0, 11)), 'np_uint32')
    , (np.uint64(np.arange(0, 11)), 'np_uint64')
    , (np.float32(np.arange(-5.5, 5.5)), 'np_float32')
    , (np.float64(np.arange(-5.5, 5.5)), 'np_float64')
]

df = cudf.DataFrame()

supported_dtypes = []

for col_val, col_name in dtypes_vals:
    df[col_name] = col_val

    try:
        bc.create_table('dtype_test', df)
        bc.sql('SELECT * FROM dtype_test')
        supported_dtypes.append(col_name)
    except:
        print(f'{col_name} not supported...')
        df.drop(columns=[col_name], inplace=True)

Running bc.sql('SELECT * FROM dtype_test').dtypes will produce the following

np_int8          int8
np_int16        int16
np_int32        int32
np_int64        int64
np_uint8        uint8
np_uint16      uint16
np_uint32      uint32
np_uint64      uint64
np_float32    float32
np_float64    float64
dtype: object

The supported_dtypes list shows the following

['np_int8',
 'np_int16',
 'np_int32',
 'np_int64',
 'np_uint8',
 'np_uint16',
 'np_uint32',
 'np_uint64',
 'np_float32',
 'np_float64']
felipeblazing commented 3 years ago

They are not fully supported in the engine and the places above are what has to change to make them more fully supported