facebookincubator / velox

A C++ vectorized database acceleration library aimed to optimizing query engines and data processing systems.
https://velox-lib.io/
Apache License 2.0
3.27k stars 1.08k forks source link

Parquet reader: can't read parquet file with no column indexes #9463

Open yma11 opened 2 months ago

yma11 commented 2 months ago

Bug description

When reading parquet example file test-file-with-no-column-indexes-1.parquet, following error pops up:

Job aborted due to stage failure: Task 0 in stage 14.0 failed 1 times, most recent failure: Lost task 0.0 in stage 14.0 (TID 14) (10.0.2.142 executor driver): org.apache.gluten.exception.GlutenException: java.lang.RuntimeException: Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Operator::getOutput failed for [operator: TableScan, plan node ID: 0]: vector::_M_range_check: __n (which is 1) >= this->size() (which is 1)
Retriable: False
Function: runInternal
File: ../../velox/exec/Driver.cpp
Line: 686
Stack trace:
# 0  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 1  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&>(facebook::velox::detail::VeloxCheckFailArgs const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
# 2  facebook::velox::exec::Driver::runInternal(std::shared_ptr<facebook::velox::exec::Driver>&, std::shared_ptr<facebook::velox::exec::BlockingState>&, std::shared_ptr<facebook::velox::RowVector>&) [clone .cold]
# 3  facebook::velox::exec::Driver::next(std::shared_ptr<facebook::velox::exec::BlockingState>&)
# 4  facebook::velox::exec::Task::next(folly::SemiFuture<folly::Unit>*)
# 5  gluten::WholeStageResultIterator::next()
# 6  Java_org_apache_gluten_vectorized_ColumnarBatchOutIterator_nativeHasNext
# 7  0x00007f75ad020907
# 8  0x00007f75ad0078ef
# 9  0x00007f75ad0078ef
# 10 0x00007f75adc66c6b

    at org.apache.gluten.vectorized.GeneralOutIterator.hasNext(GeneralOutIterator.java:39)
    at scala.collection.convert.Wrappers$JIteratorWrapper.hasNext(Wrappers.scala:45)
    at org.apache.gluten.utils.InvocationFlowProtection.hasNext(Iterators.scala:135)
    at org.apache.gluten.utils.IteratorCompleter.hasNext(Iterators.scala:69)
    at org.apache.gluten.utils.PayloadCloser.hasNext(Iterators.scala:35)
    at org.apache.gluten.utils.PipelineTimeAccumulator.hasNext(Iterators.scala:98)
    at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
    at scala.collection.Iterator.isEmpty(Iterator.scala:387)
    at scala.collection.Iterator.isEmpty$(Iterator.scala:387)
    at org.apache.spark.InterruptibleIterator.isEmpty(InterruptibleIterator.scala:28)
    at org.apache.gluten.execution.VeloxColumnarToRowExec$.toRowIterator(VeloxColumnarToRowExec.scala:119)
    at 

System information

Commit: d9454d63d190da9d30cae39a4dca9ac25b0da6b7 CMake Version: 3.16.3 System: Linux-5.4.0-156-generic Arch: x86_64 C++ Compiler: /usr/bin/c++ C++ Compiler Version: 9.4.0 C Compiler: /usr/bin/cc C Compiler Version: 9.4.0 CMake Prefix Path: /usr/local;/usr;/;/usr;/usr/local;/usr/X11R6;/usr/pkg;/opt

Relevant logs

No response

qqibrow commented 3 weeks ago

thanks for sharing. did a quick test. Here is the stacktrace in velox:

E0607 20:33:30.573619 210982 Exceptions.h:69] Line: ../../velox/dwio/parquet/reader/ParquetReader.cpp:443, Function:getParquetColumnInfo, Expression:  Unable to extract Parquet column info., Source: RUNTIME, ErrorCode: INVALID_STATE
terminate called after throwing an instance of 'facebook::velox::VeloxRuntimeError'
  what():  Exception: VeloxRuntimeError
Error Source: RUNTIME
Error Code: INVALID_STATE
Reason: Unable to extract Parquet column info.
Retriable: False
Function: getParquetColumnInfo
File: ../../velox/dwio/parquet/reader/ParquetReader.cpp
Line: 443
Stack trace:
# 0  std::shared_ptr<facebook::velox::VeloxException::State const> facebook::velox::VeloxException::State::make<facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1}>(facebook::velox::VeloxException::Type, facebook::velox::VeloxException::make(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)::{lambda(auto:1&)#1})
# 1  facebook::velox::VeloxException::VeloxException(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, facebook::velox::VeloxException::Type, std::basic_string_view<char, std::char_traits<char> >)
# 2  facebook::velox::VeloxRuntimeError::VeloxRuntimeError(char const*, unsigned long, char const*, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, std::basic_string_view<char, std::char_traits<char> >, bool, std::basic_string_view<char, std::char_traits<char> >)
# 3  void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, char const*>(facebook::velox::detail::VeloxCheckFailArgs const&, char const*)
# 4  facebook::velox::parquet::ReaderBase::getParquetColumnInfo(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int&, unsigned int&) const
# 5  facebook::velox::parquet::ReaderBase::getParquetColumnInfo(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int&, unsigned int&) const
# 6  facebook::velox::parquet::ReaderBase::getParquetColumnInfo(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int&, unsigned int&) const
# 7  facebook::velox::parquet::ReaderBase::initializeSchema()
# 8  facebook::velox::parquet::ReaderBase::ReaderBase(std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&)
# 9  void __gnu_cxx::new_allocator<facebook::velox::parquet::ReaderBase>::construct<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(facebook::velox::parquet::ReaderBase*, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 10 void std::allocator_traits<std::allocator<facebook::velox::parquet::ReaderBase> >::construct<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::allocator<facebook::velox::parquet::ReaderBase>&, facebook::velox::parquet::ReaderBase*, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 11 std::_Sp_counted_ptr_inplace<facebook::velox::parquet::ReaderBase, std::allocator<facebook::velox::parquet::ReaderBase>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 12 std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<facebook::velox::parquet::ReaderBase, std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(facebook::velox::parquet::ReaderBase*&, std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::parquet::ReaderBase> >, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 13 std::__shared_ptr<facebook::velox::parquet::ReaderBase, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::parquet::ReaderBase> >, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 14 std::shared_ptr<facebook::velox::parquet::ReaderBase>::shared_ptr<std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::parquet::ReaderBase> >, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 15 std::shared_ptr<facebook::velox::parquet::ReaderBase> std::allocate_shared<facebook::velox::parquet::ReaderBase, std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::allocator<facebook::velox::parquet::ReaderBase> const&, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 16 std::shared_ptr<facebook::velox::parquet::ReaderBase> std::make_shared<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
# 17 facebook::velox::parquet::ParquetReader::ParquetReader(std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&)
# 18 main
# 19 __libc_start_main
# 20 _start

*** Aborted at 1717792410 (Unix time, try 'date -d @1717792410') ***
*** Signal 6 (SIGABRT) (0x3e4700033826) received by PID 210982 (pthread TID 0x7f882c66c9c0) (linux TID 210982) (maybe from PID 210982, UID 15943) (code: -6), stack trace: ***
    @ 000000000213fce1 folly::symbolizer::(anonymous namespace)::innerSignalHandler(int, siginfo_t*, void*)
                       /home/lniu/code/velox_0221/velox/folly/_build/../folly/experimental/symbolizer/SignalHandler.cpp:449
    @ 000000000213fdc2 folly::symbolizer::(anonymous namespace)::signalHandler(int, siginfo_t*, void*)
                       /home/lniu/code/velox_0221/velox/folly/_build/../folly/experimental/symbolizer/SignalHandler.cpp:470
    @ 000000000001441f (unknown)
    @ 000000000004300b gsignal
    @ 0000000000022858 abort
    @ 000000000009e910 (unknown)
    @ 00000000000aa38b (unknown)
    @ 00000000000aa3f6 std::terminate()
    @ 00000000000aa6a8 __cxa_throw
    @ 0000000001fd3ae6 __cxa_throw
                       /home/lniu/code/velox_0221/velox/folly/_build/../folly/experimental/exception_tracer/ExceptionTracerLib.cpp:159
    @ 0000000001ed4d0b void facebook::velox::detail::veloxCheckFail<facebook::velox::VeloxRuntimeError, char const*>(facebook::velox::detail::VeloxCheckFailArgs const&, char const*)
                       /home/lniu/code/velox_new/velox/_build/debug/../.././velox/common/base/Exceptions.h:85
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/common/base/Exceptions.cpp
    @ 00000000010e63bc facebook::velox::parquet::ReaderBase::getParquetColumnInfo(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int&, unsigned int&) const
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:443
    @ 00000000010e54c4 facebook::velox::parquet::ReaderBase::getParquetColumnInfo(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int&, unsigned int&) const
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:269
    @ 00000000010e54c4 facebook::velox::parquet::ReaderBase::getParquetColumnInfo(unsigned int, unsigned int, unsigned int, unsigned int, unsigned int&, unsigned int&) const
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:269
    @ 00000000010e4ff5 facebook::velox::parquet::ReaderBase::initializeSchema()
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:219
    @ 00000000010e453a facebook::velox::parquet::ReaderBase::ReaderBase(std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:139
    @ 00000000011093cc void __gnu_cxx::new_allocator<facebook::velox::parquet::ReaderBase>::construct<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(facebook::velox::parquet::ReaderBase*, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/ext/new_allocator.h:146
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 0000000001108751 void std::allocator_traits<std::allocator<facebook::velox::parquet::ReaderBase> >::construct<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::allocator<facebook::velox::parquet::ReaderBase>&, facebook::velox::parquet::ReaderBase*, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/alloc_traits.h:483
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 0000000001107248 std::_Sp_counted_ptr_inplace<facebook::velox::parquet::ReaderBase, std::allocator<facebook::velox::parquet::ReaderBase>, (__gnu_cxx::_Lock_policy)2>::_Sp_counted_ptr_inplace<std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/shared_ptr_base.h:548
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 000000000110465a std::__shared_count<(__gnu_cxx::_Lock_policy)2>::__shared_count<facebook::velox::parquet::ReaderBase, std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(facebook::velox::parquet::ReaderBase*&, std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::parquet::ReaderBase> >, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/shared_ptr_base.h:679
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 00000000011019d9 std::__shared_ptr<facebook::velox::parquet::ReaderBase, (__gnu_cxx::_Lock_policy)2>::__shared_ptr<std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::parquet::ReaderBase> >, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/shared_ptr_base.h:1344
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 00000000010fddea std::shared_ptr<facebook::velox::parquet::ReaderBase>::shared_ptr<std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::_Sp_alloc_shared_tag<std::allocator<facebook::velox::parquet::ReaderBase> >, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/shared_ptr.h:359
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 00000000010f9800 std::shared_ptr<facebook::velox::parquet::ReaderBase> std::allocate_shared<facebook::velox::parquet::ReaderBase, std::allocator<facebook::velox::parquet::ReaderBase>, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::allocator<facebook::velox::parquet::ReaderBase> const&, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/shared_ptr.h:702
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 00000000010f45f3 std::shared_ptr<facebook::velox::parquet::ReaderBase> std::make_shared<facebook::velox::parquet::ReaderBase, std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&>(std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >&&, facebook::velox::dwio::common::ReaderOptions const&)
                       /usr/include/c++/9/bits/shared_ptr.h:718
                       -> /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp
    @ 00000000010e7c9b facebook::velox::parquet::ParquetReader::ParquetReader(std::unique_ptr<facebook::velox::dwio::common::BufferedInput, std::default_delete<facebook::velox::dwio::common::BufferedInput> >, facebook::velox::dwio::common::ReaderOptions const&)
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/reader/ParquetReader.cpp:874
    @ 00000000010d1638 main
                       /home/lniu/code/velox_new/velox/_build/debug/../../velox/dwio/parquet/tests/reader/ParquetReaderExample.cpp:80
    @ 0000000000024082 __libc_start_main
    @ 00000000010ceced _start
Aborted
majetideepak commented 3 weeks ago

I will take a look at this.

majetideepak commented 3 weeks ago

@yma11 can you confirm the schema/ddl used is id bigint, name varchar, location ROW(lon double, lat double), phoneNumbers MAP(bigint, varchar)?

yma11 commented 2 weeks ago

@majetideepak Thanks for looking at this issue. This is a file from parquet-mr project but as I checked today, parquet-tools can't inspect it and Spark3.3 can't read it. Can you check whether presto can read it? if not, velox may not need to support this too. What do you think?

majetideepak commented 1 week ago

@yma11 Presto can read this file with the schema I shared. The Map encoding is from the older Parquet spec.

presto> select * from hivefile.test.complex;
 id  | name |        location         | phonenumbers 
-----+------+-------------------------+--------------
   0 | p0   | NULL                    | {0=cell}     
  15 | p15  | NULL                    | {15=cell}    
  16 | p16  | {lon=16.0, lat=32.0}    | {16=cell}    
  17 | p17  | {lon=17.0, lat=null}    | {17=cell}    
  18 | p18  | NULL                    | {18=cell}    
  19 | p19  | {lon=19.0, lat=38.0}    | {19=cell}    
......
yma11 commented 1 week ago

@yma11 Presto can read this file with the schema I shared. The Map encoding is from the older Parquet spec.

presto> select * from hivefile.test.complex;
 id  | name |        location         | phonenumbers 
-----+------+-------------------------+--------------
   0 | p0   | NULL                    | {0=cell}     
  15 | p15  | NULL                    | {15=cell}    
  16 | p16  | {lon=16.0, lat=32.0}    | {16=cell}    
  17 | p17  | {lon=17.0, lat=null}    | {17=cell}    
  18 | p18  | NULL                    | {18=cell}    
  19 | p19  | {lon=19.0, lat=38.0}    | {19=cell}    
......

This should be the right content. It will be great if you can fix Velox based on it. Thanks.