BlazingDB / blazingsql

BlazingSQL is a lightweight, GPU accelerated, SQL engine for Python. Built on RAPIDS cuDF.
https://blazingsql.com
Apache License 2.0
1.92k stars 181 forks source link

Fix PostgreSQL data provider issues #1491

Open gcca opened 3 years ago

gcca commented 3 years ago

Currently, PostgreSQL data provider implements the standard way of sql data providers. But PostgreSQL works in a different way and some e2e tests are failing.

For instance, limit clause returns unpredictable row subsets of query. So our provider needs to deal with that scenario.

To see the error execute (tpch)

select c_custkey, c_nationkey, c_acctbal
    from customer where c_custkey < 150 and c_nationkey = 5
    or c_custkey = 200 or c_nationkey >= 10
   or c_acctbal <= 500 order by c_custkey limit 75""")

The output is

BlazingContext ready
QUERY: SELECT c_custkey, c_nationkey, c_acctbal  FROM customer order by c_nationkey LIMIT 7000 OFFSET 0
COUNT: 7000
QUERY: SELECT c_custkey, c_nationkey, c_acctbal  FROM customer order by c_nationkey LIMIT 7000 OFFSET 7000
COUNT: 7000
QUERY: SELECT c_custkey, c_nationkey, c_acctbal  FROM customer order by c_nationkey LIMIT 7000 OFFSET 14000
COUNT: 1000
    c_custkey c_nationkey c_acctbal
0           2          13    121.65
1           6          20   7638.57
2           7          18   9561.95
3           8          17   6819.74
4          10           5   2753.54
..        ...         ...       ...
70        117          24   3950.83
71        118          18   3582.37
72        120          12    363.75
73        120          12    363.75
74        123           5   5897.83

[75 rows x 3 columns]

The error is in the duplicated rows

72        120          12    363.75
73        120          12    363.75