BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.92k stars 181 forks source link

Bug when join 2 table with [BUG] #1599

Open nic9lif3 opened 2 years ago

nic9lif3 commented 2 years ago

When I process this sql:

SELECT 4 TIME_ID
    ,STOCK_ID
    ,MAX(CASE WHEN TIME_ID BETWEEN 4 - 1597 + 1 AND 4 - 1 + 1 THEN VAR12 ELSE null END) MAX_VAR12_1M_1597M
FROM (
    SELECT *
    FROM VAR1_TMP_BOOK_INFO
    WHERE TIME_ID BETWEEN 4 - 17711 + 1
            AND 4
    ) VAR1_TMP_BOOK_INFO
JOIN (
    SELECT stock_id stock_id1
        ,time_id time_id1
        ,seconds_in_bucket seconds_in_bucket1
        ,LAG(VAR1, 1) OVER (
            PARTITION BY stock_id
            ,time_id ORDER BY seconds_in_bucket
            ) VAR1_LAG
    FROM VAR1_TMP_BOOK_INFO
    WHERE TIME_ID BETWEEN 4 - 1
            AND 4
    ) VAR1_TMP_BOOK_INFO_1 ON VAR1_TMP_BOOK_INFO.stock_id = VAR1_TMP_BOOK_INFO_1.stock_id1
    AND VAR1_TMP_BOOK_INFO.time_id = VAR1_TMP_BOOK_INFO_1.time_id1
    AND VAR1_TMP_BOOK_INFO.seconds_in_bucket = VAR1_TMP_BOOK_INFO_1.seconds_in_bucket1
GROUP BY STOCK_ID

then it raises error

RunExecuteGraphError: [RunExecuteGraph Error] Ral failure at: /opt/conda/envs/rapids/conda-bld/blazingsql_1623363481746/work/engine/src/Interpreter/interpreter_cpp.cu:330: Operations between literals is not supported

If I remove condition in agg, it works:

SELECT 4 TIME_ID
    ,STOCK_ID
    ,MAX(CASE WHEN 1=1 THEN VAR12 ELSE null END) MAX_VAR12_1M_1597M
FROM (
    SELECT *
    FROM VAR1_TMP_BOOK_INFO
    WHERE TIME_ID BETWEEN 4 - 17711 + 1
            AND 4
    ) VAR1_TMP_BOOK_INFO
JOIN (
    SELECT stock_id stock_id1
        ,time_id time_id1
        ,seconds_in_bucket seconds_in_bucket1
        ,LAG(VAR1, 1) OVER (
            PARTITION BY stock_id
            ,time_id ORDER BY seconds_in_bucket
            ) VAR1_LAG
    FROM VAR1_TMP_BOOK_INFO
    WHERE TIME_ID BETWEEN 4 - 1
            AND 4
    ) VAR1_TMP_BOOK_INFO_1 ON VAR1_TMP_BOOK_INFO.stock_id = VAR1_TMP_BOOK_INFO_1.stock_id1
    AND VAR1_TMP_BOOK_INFO.time_id = VAR1_TMP_BOOK_INFO_1.time_id1
    AND VAR1_TMP_BOOK_INFO.seconds_in_bucket = VAR1_TMP_BOOK_INFO_1.seconds_in_bucket1
GROUP BY STOCK_ID

or if I remove join table it also works:

SELECT 4 TIME_ID
    ,STOCK_ID
    ,MAX(CASE WHEN TIME_ID BETWEEN 4 - 1597 + 1 AND 4 - 1 + 1 THEN VAR12 ELSE null END) MAX_VAR12_1M_1597M
FROM (
    SELECT *
    FROM VAR1_TMP_BOOK_INFO
    WHERE TIME_ID BETWEEN 4 - 17711 + 1
            AND 4
    ) VAR1_TMP_BOOK_INFO
GROUP BY STOCK_ID

The source table likes that:

TIME_ID STOCK_ID SECONDS_IN_BUCKET VAR1 VAR12
0 4 0 1 0.000306751 -0.590466
1 4 0 5 0.00060095 -10.1625
2 4 0 0 0.000306751 -0.539158