databendlabs / databend

๐——๐—ฎ๐˜๐—ฎ, ๐—”๐—ป๐—ฎ๐—น๐˜†๐˜๐—ถ๐—ฐ๐˜€ & ๐—”๐—œ. Modern alternative to Snowflake. Cost-effective and simple for massive-scale analytics. https://databend.com
https://docs.databend.com
Other
7.88k stars 753 forks source link

bug: push_down_filter causes incorrect results #16779

Open forsaken628 opened 2 weeks ago

forsaken628 commented 2 weeks ago

Search before asking

Version

9c2c3dd75ecd9e90bf950cf833d6659191996214

What's Wrong?

SELECT *
FROM
  (SELECT i_category,
          i_class,
          i_brand,
          i_product_name,
          sumsales,
          rank() OVER (PARTITION BY i_category
                       ORDER BY sumsales DESC) rk
   FROM
     (SELECT i_category,
             i_class,
             i_brand,
             i_product_name,
             d_year,
             d_qoy,
             d_moy,
             s_store_id,
             sum(coalesce(ss_sales_price*ss_quantity,0)) sumsales
      FROM store_sales,
           date_dim,
           store,
           item
      WHERE ss_sold_date_sk=d_date_sk
        AND ss_item_sk=i_item_sk
        AND ss_store_sk = s_store_sk
        AND d_month_seq BETWEEN 1200 AND 1200+11
      GROUP BY rollup(i_category, i_class, i_brand, i_product_name, d_year, d_qoy, d_moy,s_store_id))dw1) dw2
WHERE rk <= 100 and i_category = 'Books' order by i_category nulls first;

The outermost level of this query contains the predicate i_category = 'Books', but the result contains rows where i_category is null.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚    i_category    โ”‚      i_class     โ”‚        i_brand        โ”‚  i_product_name  โ”‚         sumsales         โ”‚   rk   โ”‚
โ”‚ Nullable(String) โ”‚ Nullable(String) โ”‚    Nullable(String)   โ”‚ Nullable(String) โ”‚ Nullable(Decimal(38, 2)) โ”‚ UInt64 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ NULL             โ”‚ NULL             โ”‚ NULL                  โ”‚ NULL             โ”‚ 1480754.12               โ”‚      1 โ”‚
โ”‚ Books            โ”‚ NULL             โ”‚ NULL                  โ”‚ NULL             โ”‚ 1480754.12               โ”‚      1 โ”‚
โ”‚ Books            โ”‚ fiction          โ”‚ NULL                  โ”‚ NULL             โ”‚ 356542.34                โ”‚      2 โ”‚
โ”‚ Books            โ”‚ self-help        โ”‚ NULL                  โ”‚ NULL             โ”‚ 345808.39                โ”‚      3 โ”‚
โ”‚ Books            โ”‚ fiction          โ”‚ scholarunivamalg #8   โ”‚ NULL             โ”‚ 269225.19                โ”‚      4 โ”‚

How to Reproduce?

run tests/sqllogictests/scripts/prepare_tpcds_data.sh to construct the dataset

Are you willing to submit PR?

inviscid commented 1 week ago

It also appears that the ROLLUP() method cannot be combined with other GROUP BY columns to reduce the number of generated combinations. This is an example:

DROP TABLE IF EXISTS sales;
CREATE TABLE sales (
    brand VARCHAR NOT NULL,
    segment VARCHAR NOT NULL,
    quantity INT NOT NULL
);
INSERT INTO sales (brand, segment, quantity)
VALUES
    ('ABC', 'Premium', 100),
    ('ABC', 'Basic', 200),
    ('XYZ', 'Premium', 100),
    ('XYZ', 'Basic', 300);

-- This Works
SELECT
    brand,
    segment,
    SUM (quantity)
FROM
    sales
GROUP BY
    ROLLUP (brand, segment)
ORDER BY
    brand,
    segment;

-- This does not work
SELECT
    brand,
    segment,
    SUM (quantity)
FROM
    sales
GROUP BY
    brand,
    ROLLUP (segment)
ORDER BY
    brand,
    segment;

It returns the following error:

unexpected `segment`, expecting `SELECT`, `EXCEPT`, `INTERSECT`, `IGNORE_RESULT`, `LIMIT`, `OFFSET`, `FROM`, `UNION`, `ORDER`, `VALUES`, `(`, or `WITH`

The expected behavior is that the any number of GROUP BY columns can be specified prior to specifying the ROLLUP() columns in order to reduce the calculated columns.

If this is a separate issue from the one above I would be happy to create a new issue.