StarRocks / starrocks

StarRocks, a Linux Foundation project, is a next-generation sub-second MPP OLAP database for full analytics scenarios, including multi-dimensional analytics, real-time analytics, and ad-hoc queries.
https://starrocks.io
Apache License 2.0
8.74k stars 1.75k forks source link

Incorrect query results, possibly due to incorrect inlining of UUID function #31647

Closed rcauble closed 11 months ago

rcauble commented 1 year ago

Steps to reproduce the behavior (Required)

unzip [bug.zip](https://github.com/StarRocks/starrocks/files/12701386/bug.zip)
cd bug-repro
./bug.sh

Expected behavior (Required)

If you diff correct.sql and bug.sql, you can see that there's a small change that should not affect the results. I expect the to SQLs to produce equivalent results:

Sleeping for 1 minute
Creating database
Running Correct.sql
JoinColumnResolver_Column_ValueTable_000
9
1
Running Bug.sql
9
1
Removing docker container
starrocks-test
starrocks-test

Real behavior (Required)

They produce different results. My guess is that the UUID is getting inlined somewhere and causing it to produce different UUIDs for the same CTE and thus we are not joining as we expect to.

Sleeping for 1 minute
Creating database
Running Correct.sql
JoinColumnResolver_Column_ValueTable_000
9
1
Running Bug.sql
Removing docker container
starrocks-test
starrocks-test

StarRocks version (Required)

starrocks/allin1-ubuntu:3.1.0-rc01

LiShuMing commented 1 year ago

This is a fantastic discovery.

Currently, when deciding whether to use CTEs (Common Table Expressions), we do not take into account whether non-deterministic functions are present within the operators. This can result in unexpected outcomes.

Solutions

  1. use set cbo_cte_reuse_rate=0 to enforce cte for reuse to avoid this bug.
  2. I will add a rule to enforce cte when non-deterministic functions are present within the operators later.