facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.39k stars 1.11k forks source link

An unsupported nested encoding was found. #10397

Open JkSelf opened 1 month ago

JkSelf commented 1 month ago

Bug description

When we execute the SQL statement below using Gluten, it raises an exception with the message 'An unsupported nested encoding was found.

sql("create table map_table(a map<bigint, string>) using parquet")
sql("insert into table map_table select map(1, 'hello')")

System information

Velox System Info v0.0.2 Commit: 467812fb0a0220fd6b479502dd89b21973a4a3b3 CMake Version: 3.28.3 System: Linux-5.4.0-167-generic Arch: x86_64 C++ Compiler: /usr/bin/c++ C++ Compiler Version: 9.4.0 C Compiler: /usr/bin/cc C Compiler Version: 9.4.0 CMake Prefix Path: /usr/local;/usr;/;/usr/local/lib/python3.8/dist-packages/cmake/data;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

Caused by: org.apache.gluten.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxRuntimeError
 Error Source: RUNTIME
 Error Code: INVALID_STATE
 Reason: An unsupported nested encoding was found.
 Retriable: False
 Expression: vec.valueVector() == nullptr || vec.wrappedVector()->isFlatEncoding()
 Context: Operator: TableWrite[2] 2
 Function: exportFlattenedVector
 File: /__w/incubator-gluten/incubator-gluten/ep/build-velox/build/velox_ep/velox/vector/arrow/Bridge.cpp
 Line: 884
 Stack trace:
 # 0  _ZN8facebook5velox7process10StackTraceC1Ei
 # 1  _ZN8facebook5velox14VeloxExceptionC1EPKcmS3_St17basic_string_viewIcSt11char_traitsIcEES7_S7_S7_bNS1_4TypeES7_
 # 2  _ZN8facebook5velox6detail14veloxCheckFailINS0_17VeloxRuntimeErrorEPKcEEvRKNS1_18VeloxCheckFailArgsET0_
 # 3  _ZN8facebook5velox12_GLOBAL__N_121exportFlattenedVectorERKNS0_10BaseVectorERKNS1_9SelectionERK12ArrowOptionsR10ArrowArrayPNS0_6memory10MemoryPoolERNS1_24VeloxToArrowBridgeHolderE
 # 4  _ZN8facebook5velox12_GLOBAL__N_117exportToArrowImplERKNS0_10BaseVectorERKNS1_9SelectionERK12ArrowOptionsR10ArrowArrayPNS0_6memory10MemoryPoolE
 # 5  _ZN8facebook5velox12_GLOBAL__N_117exportToArrowImplERKNS0_10BaseVectorERKNS1_9SelectionERK12ArrowOptionsR10ArrowArrayPNS0_6memory10MemoryPoolE
 # 6  _ZN8facebook5velox13exportToArrowERKSt10shared_ptrINS0_10BaseVectorEER10ArrowArrayPNS0_6memory10MemoryPoolERK12ArrowOptions
 # 7  _ZN8facebook5velox7parquet6Writer5writeERKSt10shared_ptrINS0_10BaseVectorEE
 # 8  _ZN8facebook5velox9connector4hive12HiveDataSink5writeEmSt10shared_ptrINS0_9RowVectorEE
 # 9  _ZN8facebook5velox9connector4hive12HiveDataSink10appendDataESt10shared_ptrINS0_9RowVectorEE
 # 10 _ZN8facebook5velox4exec11TableWriter8addInputESt10shared_ptrINS0_9RowVectorEE
 # 11 _ZN8facebook5velox4exec6Driver11runInternalERSt10shared_ptrIS2_ERS3_INS1_13BlockingStateEERS3_INS0_9RowVectorEE
 # 12 _ZN8facebook5velox4exec6Driver4nextERSt10shared_ptrINS1_13BlockingStateEE
 # 13 _ZN8facebook5velox4exec4Task4nextEPN5folly10SemiFutureINS3_4UnitEEE
 # 14 _ZN6gluten24WholeStageResultIterator4nextEv
 # 15 Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
 # 16 0x00007f8833c9cfb0
JkSelf commented 1 month ago

@mbasmanova @majetideepak @rui-mo Do you have any input? Thanks.

rui-mo commented 1 month ago

@JkSelf I met the same issue and please check the discussion in this issue https://github.com/facebookincubator/velox/issues/9821.

yingsu00 commented 1 month ago

I think we will need to support writing non-flat vectors in Parquet writer. @majetideepak what do you think?

majetideepak commented 1 month ago

We do support writing regular dictionaries to Parquet. See https://github.com/facebookincubator/velox/pull/7025 I am curious as to why complex types have a problem. I'll look into this. We know constant vectors are flattened until Arrow supports writing REE encoding to Parquet. There is some issue somewhere for this.

Yohahaha commented 1 month ago

https://github.com/facebookincubator/velox/pull/9406

I have a draft PR to support flatten complex vector.