facebookincubator / velox

A composable and fully extensible C++ execution engine library for data management systems.
https://velox-lib.io/
Apache License 2.0
3.52k stars 1.15k forks source link

DuckDBQueryRunner has different results in VARCHAR #4539

Open Yohahaha opened 1 year ago

Yohahaha commented 1 year ago

Bug description

PR #4532 find DuckDB in some case may return error results.

This PR give a test case to reproduce bug.

Reproduce

Add this test in velox/exec/tests/QueryAssertionsTest.cpp


TEST_F(QueryAssertionsTest, test) {
  {
    std::vector<std::string> data(21);
    std::vector<RowVectorPtr> vectors = {makeRowVector({
        makeFlatVector<int32_t>(21, [](auto row) { return row % 7; }),
        makeFlatVector<StringView>(
            21, // size
            [&data](auto row) {
              data[row] = std::to_string(row);
              return StringView(data[row]);
            }, // valueAt
            [](auto row) { return row % 3 == 0; }), // nullAt
    })};
    LOG(INFO) << vectors[0]->toString(0, vectors[0]->size());
    duckDbQueryRunner_.createTable("tmp", vectors);
    {
      LOG(INFO) << "=== first agg for varchar";
      auto sql = "select c0, first(c1) from tmp group by c0";
      auto result = duckDbQueryRunner_.execute(sql);
      LOG(INFO) << result->ToString();
    }
    {
      LOG(INFO) << "=== last agg for varchar";
      auto sql = "select c0, last(c1) from tmp group by c0";
      auto result = duckDbQueryRunner_.execute(sql);
      LOG(INFO) << result->ToString();
    }
  }
  {
    std::vector<RowVectorPtr> vectors = {makeRowVector({
        makeFlatVector<int32_t>(21, [](auto row) { return row % 7; }),
        makeFlatVector<int32_t>(
            21, // size
            [](auto row) { return row; }, // valueAt
            [](auto row) { return row % 3 == 0; }), // nullAt
    })};
    LOG(INFO) << vectors[0]->toString(0, vectors[0]->size());
    duckDbQueryRunner_.createTable("tmp", vectors);
    {
      LOG(INFO) << "=== first agg for integer";
      auto sql = "select c0, first(c1) from tmp group by c0";
      auto result = duckDbQueryRunner_.execute(sql);
      LOG(INFO) << result->ToString();
    }
    {
      LOG(INFO) << "=== last agg for integer";
      auto sql = "select c0, last(c1) from tmp group by c0";
      auto result = duckDbQueryRunner_.execute(sql);
      LOG(INFO) << result->ToString();
    }
  }
}

log output:

I0406 20:56:51.172724 4148279 QueryAssertionsTest.cpp:132] 0: {0, null}
1: {1, 1}
2: {2, 2}
3: {3, null}
4: {4, 4}
5: {5, 5}
6: {6, null}
7: {0, 7}
8: {1, 8}
9: {2, null}
10: {3, 10}
11: {4, 11}
12: {5, null}
13: {6, 13}
14: {0, 14}
15: {1, null}
16: {2, 16}
17: {3, 17}
18: {4, null}
19: {5, 19}
20: {6, 20}
I0406 20:56:51.175751 4148279 QueryAssertionsTest.cpp:135] === first agg for varchar
I0406 20:56:51.177181 4148279 QueryAssertionsTest.cpp:138] c0   first(c1)   
INTEGER VARCHAR 
[ Rows: 7]
0   NULL    
1   1   
2   2   
3   NULL    
4   4   
5   5   
6   NULL    

I0406 20:56:51.177208 4148279 QueryAssertionsTest.cpp:141] === last agg for varchar
I0406 20:56:51.178416 4148279 QueryAssertionsTest.cpp:144] c0   last(c1)    
INTEGER VARCHAR 
[ Rows: 7]
0   NULL    
1   NULL    
2   NULL    
3   NULL    
4   NULL    
5   NULL    
6   NULL    

I0406 20:56:51.178485 4148279 QueryAssertionsTest.cpp:155] 0: {0, null}
1: {1, 1}
2: {2, 2}
3: {3, null}
4: {4, 4}
5: {5, 5}
6: {6, null}
7: {0, 7}
8: {1, 8}
9: {2, null}
10: {3, 10}
11: {4, 11}
12: {5, null}
13: {6, 13}
14: {0, 14}
15: {1, null}
16: {2, 16}
17: {3, 17}
18: {4, null}
19: {5, 19}
20: {6, 20}
I0406 20:56:51.181229 4148279 QueryAssertionsTest.cpp:158] === first agg for integer
I0406 20:56:51.182405 4148279 QueryAssertionsTest.cpp:161] c0   first(c1)   
INTEGER INTEGER 
[ Rows: 7]
0   NULL    
1   1   
2   2   
3   NULL    
4   4   
5   5   
6   NULL    

I0406 20:56:51.182426 4148279 QueryAssertionsTest.cpp:164] === last agg for integer
I0406 20:56:51.183584 4148279 QueryAssertionsTest.cpp:167] c0   last(c1)    
INTEGER INTEGER 
[ Rows: 7]
0   14  
1   NULL    
2   16  
3   17  
4   NULL    
5   19  
6   20  

DuckDB

But I can not reproduce it on DuckDB Shell

create table tmp(c0 INTEGER, c1 VARCHAR);
insert into tmp values(1, null), (1, 'a'), (2,'b'),(2, null);
select c0, first(c1) from tmp group by c0;
select c0, last(c1) from tmp group by c0;

image

This issue make result assertion hard.

System information

Velox System Info v0.0.2 Commit: Not in a git repo. CMake Version: 3.20.2 System: Linux-5.10.134-13.al8.x86_64 Arch: x86_64 C++ Compiler: /usr/lib64/ccache/c++ C++ Compiler Version: 10.2.1 C Compiler: /usr/lib64/ccache/cc C Compiler Version: 10.2.1 CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

yingsu00 commented 1 year ago

@Yohahaha Are you using the latest DuckDB when you're using DuckDB shell?

Yohahaha commented 1 year ago

Hi @yingsu00

DuckDB Shell log

DuckDB Web Shell
Database: v0.7.2-dev1385
Package:  @duckdb/duckdb-wasm@1.24.1-dev41.0