apache / incubator-gluten

Gluten is a middle layer responsible for offloading JVM-based SQL engines' execution to native engines.
https://gluten.apache.org/
Apache License 2.0
1.22k stars 437 forks source link

[CH] Parquet page index reader failed with `or` logical operator #7713

Closed baibaichen closed 3 weeks ago

baibaichen commented 3 weeks ago

Backend

CH (ClickHouse)

Bug description

when spark.gluten.sql.columnar.backend.ch.runtime_config.use_local_format=true, we get following error:

log:

Caused by: org.apache.gluten.exception.GlutenException: vector
0. std::logic_error::logic_error(char const*) @ 0x0000000013b47094
1. std::length_error::length_error[abi:v15007](char const*) @ 0x00000000064183a9
2. std::__throw_length_error[abi:v15007](char const*) @ 0x000000000641835f
3. ? @ 0x000000000be0f02d
4. local_engine::ColumnIndexFilter::calculateRowRanges(std::unordered_map<String, std::unique_ptr<local_engine::ColumnIndex, std::default_delete<local_engine::ColumnIndex>>, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, std::unique_ptr<local_engine::ColumnIndex, std::default_delete<local_engine::ColumnIndex>>>>> const&, unsigned long) const::$_1::operator()(local_engine::RowRanges (* const&)(local_engine::RowRanges const&, local_engine::RowRanges const&)) const @ 0x000000000be15f9b
5. local_engine::ColumnIndexFilter::calculateRowRanges(std::unordered_map<String, std::unique_ptr<local_engine::ColumnIndex, std::default_delete<local_engine::ColumnIndex>>, std::hash<String>, std::equal_to<String>, std::allocator<std::pair<String const, std::unique_ptr<local_engine::ColumnIndex, std::default_delete<local_engine::ColumnIndex>>>>> const&, unsigned long) const @ 0x000000000be14e4e
6. local_engine::ParquetFileReaderExt::getRowRanges(int) @ 0x000000000be085da
7. local_engine::PageIterator::nextChunkWithRowRange() @ 0x000000000be0509f
8. local_engine::VectorizedColumnReader::VectorizedColumnReader(parquet::arrow::SchemaField const&, local_engine::ParquetFileReaderExt*, std::vector<int, std::allocator<int>> const&) @ 0x000000000be04d02
9. local_engine::VectorizedParquetRecordReader::initialize(DB::Block const&, std::shared_ptr<arrow::io::RandomAccessFile> const&, std::shared_ptr<local_engine::ColumnIndexFilter> const&, std::shared_ptr<parquet::FileMetaData> const&) @ 0x000000000be06594
10. local_engine::VectorizedParquetBlockInputFormat::read() @ 0x000000000be0a086
11. DB::IInputFormat::generate() @ 0x0000000010663bd6
12. local_engine::NormalFileReader::pull(DB::Chunk&) @ 0x000000000bdf8981
13. local_engine::SubstraitFileSource::generate() @ 0x000000000bdf664b
14. DB::ISource::tryGenerate() @ 0x0000000010640cf7
15. DB::ISource::work() @ 0x0000000010640ac5
16. DB::ExecutionThreadContext::executeTask() @ 0x00000000106586c2
17. DB::PipelineExecutor::executeStepImpl(unsigned long, std::atomic<bool>*) @ 0x000000001064dc7f
18. DB::PipelineExecutor::executeStep(std::atomic<bool>*) @ 0x000000001064d809
19. DB::PullingPipelineExecutor::pull(DB::Chunk&) @ 0x000000001065ef34
20. DB::PullingPipelineExecutor::pull(DB::Block&) @ 0x000000001065f099
21. local_engine::LocalExecutor::hasNext() @ 0x000000000bb71c29
22. Java_org_apache_gluten_vectorized_BatchIterator_nativeHasNext @ 0x00000000064028d7

    at org.apache.gluten.vectorized.BatchIterator.nativeHasNext(Native Method) ~[gluten.jar:?]
    at org.apache.gluten.vectorized.BatchIterator.hasNextInternal(BatchIterator.java:53) ~[gluten.jar:?]
    at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:37) ~[gluten.jar:?]
    ... 19 more

Spark version

None

Spark configurations

No response

System information

No response

Relevant logs

No response

baibaichen commented 3 weeks ago

filter like this:

  1. (145 like '%GTC' and 142 = true) or (not 175 in (23,16,14,100))
  2. (145 not like 'GTC%' and 154 not in(11,16,14,-99)) or 154 > 3365