apache / datafusion

Apache DataFusion SQL Query Engine
https://datafusion.apache.org/
Apache License 2.0
5.63k stars 1.05k forks source link

Reading Parquet file with `int96` results in overflow panic #1359

Closed andrei-ionescu closed 2 years ago

andrei-ionescu commented 2 years ago

Describe the bug Reading Parquet file with int96 results in panic with the following error:

thread 'tokio-runtime-worker' panicked at 'attempt to multiply with overflow',
    /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:179:46

To Reproduce Steps to reproduce the behavior:

  1. Download the attached zip file that contains the parquet file: data-dimension-vehicle-20210609T222533Z-4cols-14rows.parquet.zip
  2. Unzip it and it should give you the data-dimension-vehicle-20210609T222533Z-4cols-14rows.parquet file.
  3. Create a new project with cargo new read-parquet, create a data folder in your project and put the parquet file in the data folder inside your project.
  4. Modify the Cargo.toml file to contain the following:
    
    [package]
    name = "read-parquet"
    version = "0.1.0"
    edition = "2021"

[dependencies] tokio = "1.14" arrow = "6.0" datafusion = "6.0"

4. Put the following code in `main.rs` to read the given parquet file:
```rust
use datafusion::prelude::*;

#[tokio::main]
async fn main() -> datafusion::error::Result<()> {
    let mut ctx = ExecutionContext::new(); 
    /* 
     * Parquet file schema:
     *
     * message spark_schema {
     *   optional binary licence_code (UTF8);
     *   optional binary vehicle_make (UTF8);
     *   optional binary fuel_type (UTF8);
     *   optional int96 dimension_load_date;
     * }
     */
    ctx
        .register_parquet("vehicles", "./data/data-dimension-vehicle-20210609T222533Z-4cols-14rows.parquet")
        .await?;
    let df = ctx
        .sql("
            SELECT
                licence_code,
                vehicle_make,
                fuel_type,
                CAST(dimension_load_date as TIMESTAMP) as dms
            FROM vehicles
            LiMIT 10
        ")
        .await?;

    df
        .show()
        .await?;

    Ok(())
}
  1. Execute cargo run.
  2. Result:
    thread 'tokio-runtime-worker' panicked at 'attempt to multiply with overflow', /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:179:46
    stack backtrace:
    0: rust_begin_unwind
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:498:5
    1: core::panicking::panic_fmt
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panicking.rs:107:14
    2: core::panicking::panic
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panicking.rs:48:5
    3: <parquet::arrow::converter::Int96ArrayConverter as parquet::arrow::converter::Converter<alloc::vec::Vec<core::option::Option<parquet::data_type::Int96>>,arrow::array::array_primitive::PrimitiveArray<arrow::datatypes::types::TimestampNanosecondType>>>::convert::{{closure}}::{{closure}}
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:179:46
    4: core::option::Option<T>::map
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/option.rs:846:29
    5: <parquet::arrow::converter::Int96ArrayConverter as parquet::arrow::converter::Converter<alloc::vec::Vec<core::option::Option<parquet::data_type::Int96>>,arrow::array::array_primitive::PrimitiveArray<arrow::datatypes::types::TimestampNanosecondType>>>::convert::{{closure}}
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:179:30
    6: core::iter::adapters::map::map_fold::{{closure}}
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:84:28
    7: core::iter::traits::iterator::Iterator::fold
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:2171:21
    8: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::fold
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:124:9
    9: core::iter::traits::iterator::Iterator::for_each
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:737:9
    10: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/spec_extend.rs:40:17
    11: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/spec_from_iter_nested.rs:56:9
    12: alloc::vec::source_iter_marker::<impl alloc::vec::spec_from_iter::SpecFromIter<T,I> for alloc::vec::Vec<T>>::from_iter
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/source_iter_marker.rs:31:20
    13: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/mod.rs:2549:9
    14: core::iter::traits::iterator::Iterator::collect
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:1745:9
    15: <parquet::arrow::converter::Int96ArrayConverter as parquet::arrow::converter::Converter<alloc::vec::Vec<core::option::Option<parquet::data_type::Int96>>,arrow::array::array_primitive::PrimitiveArray<arrow::datatypes::types::TimestampNanosecondType>>>::convert
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:177:13
    16: <parquet::arrow::converter::ArrayRefConverter<S,A,C> as parquet::arrow::converter::Converter<S,alloc::sync::Arc<dyn arrow::array::array::Array>>>::convert
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/converter.rs:450:9
    17: <parquet::arrow::array_reader::ComplexObjectArrayReader<T,C> as parquet::arrow::array_reader::ArrayReader>::next_batch
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/array_reader.rs:545:25
    18: <parquet::arrow::array_reader::StructArrayReader as parquet::arrow::array_reader::ArrayReader>::next_batch::{{closure}}
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/array_reader.rs:1130:27
    19: core::iter::adapters::map::map_try_fold::{{closure}}
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:91:28
    20: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:1995:21
    21: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:117:9
    22: <parquet::arrow::array_reader::StructArrayReader as parquet::arrow::array_reader::ArrayReader>::next_batch
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/array_reader.rs:1127:30
    23: <parquet::arrow::arrow_reader::ParquetRecordBatchReader as core::iter::traits::iterator::Iterator>::next
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/parquet-6.2.0/src/arrow/arrow_reader.rs:175:15
    24: datafusion::physical_plan::file_format::parquet::read_partition
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/file_format/parquet.rs:424:19
    25: <datafusion::physical_plan::file_format::parquet::ParquetExec as datafusion::physical_plan::ExecutionPlan>::execute::{{closure}}::{{closure}}
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/file_format/parquet.rs:213:29
    26: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/blocking/task.rs:42:21
    27: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/core.rs:161:17
    28: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/loom/std/unsafe_cell.rs:14:9
    29: tokio::runtime::task::core::CoreStage<T>::poll
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/core.rs:151:13
    30: tokio::runtime::task::harness::poll_future::{{closure}}
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:461:19
    31: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panic/unwind_safe.rs:271:9
    32: std::panicking::try::do_call
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:406:40
    33: <unknown>
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/distinct_expressions.rs:127:15
    34: std::panicking::try
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:370:19
    35: std::panic::catch_unwind
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panic.rs:133:14
    36: tokio::runtime::task::harness::poll_future
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:449:18
    37: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:98:27
    38: tokio::runtime::task::harness::Harness<T,S>::poll
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:53:15
    39: tokio::runtime::task::raw::poll
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/raw.rs:113:5
    40: tokio::runtime::task::raw::RawTask::poll
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/raw.rs:70:18
    41: tokio::runtime::task::UnownedTask<S>::run
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/mod.rs:379:9
    42: tokio::runtime::blocking::pool::Inner::run
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/blocking/pool.rs:264:17
    43: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
             at /Users/xxxx/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/blocking/pool.rs:244:17

I've tried the following combinations but I got the same error:

Expected behavior To be able to read that parquet file. The parquet file can be read with parquet-tools CLI and Apache Spark.

Additional context OS: macOS 12.0.1 (Monterey) Rust: rustc 1.58.0-nightly (65c55bf93 2021-11-23) Cargo: cargo 1.58.0-nightly (e1fb17631 2021-11-22)

I transformed the parquet file into CSV and everything worked as expected.

andrei-ionescu commented 2 years ago

I did two more tests by saving the dimension_load_date column in the parquet as:

I got a similar overflow panic error:

thread 'tokio-runtime-worker' panicked at 'attempt to multiply with overflow', 
    /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/ops/arith.rs:344:1

The complete error output:

thread 'tokio-runtime-worker' panicked at 'attempt to multiply with overflow', /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/ops/arith.rs:344:1
stack backtrace:
   0: rust_begin_unwind
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:498:5
   1: core::panicking::panic_fmt
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panicking.rs:107:14
   2: core::panicking::panic
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panicking.rs:48:5
   3: <i64 as core::ops::arith::Mul>::mul
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/ops/arith.rs:337:45
   4: arrow::compute::kernels::arithmetic::multiply::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-6.2.0/src/compute/kernels/arithmetic.rs:1070:40
   5: arrow::compute::kernels::arithmetic::math_op::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-6.2.0/src/compute/kernels/arithmetic.rs:181:23
   6: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/ops/function.rs:280:13
   7: core::option::Option<T>::map
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/option.rs:846:29
   8: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:103:9
   9: arrow::buffer::mutable::MutableBuffer::from_trusted_len_iter
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-6.2.0/src/buffer/mutable.rs:437:21
  10: arrow::buffer::immutable::Buffer::from_trusted_len_iter
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-6.2.0/src/buffer/immutable.rs:282:9
  11: arrow::compute::kernels::arithmetic::math_op
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-6.2.0/src/compute/kernels/arithmetic.rs:187:27
  12: arrow::compute::kernels::arithmetic::multiply
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-6.2.0/src/compute/kernels/arithmetic.rs:1070:12
  13: arrow::compute::kernels::cast::cast_with_options
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/arrow-6.2.0/src/compute/kernels/cast.rs:941:17
  14: datafusion::physical_plan::expressions::cast::cast_column
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/expressions/cast.rs:106:13
  15: <datafusion::physical_plan::expressions::cast::CastExpr as datafusion::physical_plan::PhysicalExpr>::evaluate
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/expressions/cast.rs:94:9
  16: datafusion::physical_plan::projection::ProjectionStream::batch_project::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/projection.rs:212:25
  17: core::iter::adapters::map::map_try_fold::{{closure}}
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:91:28
  18: core::iter::traits::iterator::Iterator::try_fold
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:1995:21
  19: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:117:9
  20: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/map.rs:117:9
  21: <core::iter::adapters::ResultShunt<I,E> as core::iter::traits::iterator::Iterator>::try_fold
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/mod.rs:178:9
  22: core::iter::traits::iterator::Iterator::find
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:2383:9
  23: <core::iter::adapters::ResultShunt<I,E> as core::iter::traits::iterator::Iterator>::next
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/mod.rs:160:9
  24: alloc::vec::Vec<T,A>::extend_desugared
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/mod.rs:2646:35
  25: <alloc::vec::Vec<T,A> as alloc::vec::spec_extend::SpecExtend<T,I>>::spec_extend
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/spec_extend.rs:18:9
  26: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter_nested::SpecFromIterNested<T,I>>::from_iter
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/spec_from_iter_nested.rs:37:9
  27: <alloc::vec::Vec<T> as alloc::vec::spec_from_iter::SpecFromIter<T,I>>::from_iter
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/spec_from_iter.rs:33:9
  28: <alloc::vec::Vec<T> as core::iter::traits::collect::FromIterator<T>>::from_iter
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/alloc/src/vec/mod.rs:2549:9
  29: core::iter::traits::iterator::Iterator::collect
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:1745:9
  30: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter::{{closure}}
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/result.rs:1883:53
  31: core::iter::adapters::process_results
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/adapters/mod.rs:149:17
  32: <core::result::Result<V,E> as core::iter::traits::collect::FromIterator<core::result::Result<A,E>>>::from_iter
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/result.rs:1883:9
  33: core::iter::traits::iterator::Iterator::collect
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/iter/traits/iterator.rs:1745:9
  34: datafusion::physical_plan::projection::ProjectionStream::batch_project
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/projection.rs:210:9
  35: <datafusion::physical_plan::projection::ProjectionStream as futures_core::stream::Stream>::poll_next::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/projection.rs:238:37
  36: core::task::poll::Poll<T>::map
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/task/poll.rs:52:43
  37: <datafusion::physical_plan::projection::ProjectionStream as futures_core::stream::Stream>::poll_next
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/projection.rs:237:20
  38: <core::pin::Pin<P> as futures_core::stream::Stream>::poll_next
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-core-0.3.18/src/stream.rs:120:9
  39: futures_util::stream::stream::StreamExt::poll_next_unpin
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.18/src/stream/stream/mod.rs:1474:9
  40: <futures_util::stream::stream::next::Next<St> as core::future::future::Future>::poll
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/futures-util-0.3.18/src/stream/stream/next.rs:32:9
  41: datafusion::physical_plan::common::spawn_execution::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/common.rs:179:32
  42: <core::future::from_generator::GenFuture<T> as core::future::future::Future>::poll
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/future/mod.rs:80:19
  43: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/core.rs:161:17
  44: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/loom/std/unsafe_cell.rs:14:9
  45: tokio::runtime::task::core::CoreStage<T>::poll
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/core.rs:151:13
  46: tokio::runtime::task::harness::poll_future::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:461:19
  47: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panic/unwind_safe.rs:271:9
  48: std::panicking::try::do_call
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:406:40
  49: <unknown>
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/datafusion-6.0.0/src/physical_plan/distinct_expressions.rs:127:15
  50: std::panicking::try
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:370:19
  51: std::panic::catch_unwind
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panic.rs:133:14
  52: tokio::runtime::task::harness::poll_future
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:449:18
  53: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:98:27
  54: tokio::runtime::task::harness::Harness<T,S>::poll
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:53:15
  55: tokio::runtime::task::raw::poll
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/raw.rs:113:5
  56: tokio::runtime::task::raw::RawTask::poll
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/raw.rs:70:18
  57: tokio::runtime::task::LocalNotified<S>::run
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/mod.rs:343:9
  58: tokio::runtime::thread_pool::worker::Context::run_task::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/thread_pool/worker.rs:443:21
  59: tokio::coop::with_budget::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/coop.rs:106:9
  60: std::thread::local::LocalKey<T>::try_with
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/thread/local.rs:399:16
  61: std::thread::local::LocalKey<T>::with
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/thread/local.rs:375:9
  62: tokio::coop::with_budget
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/coop.rs:99:5
  63: tokio::coop::budget
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/coop.rs:76:5
  64: tokio::runtime::thread_pool::worker::Context::run_task
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/thread_pool/worker.rs:419:9
  65: tokio::runtime::thread_pool::worker::Context::run
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/thread_pool/worker.rs:386:24
  66: tokio::runtime::thread_pool::worker::run::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/thread_pool/worker.rs:371:17
  67: tokio::macros::scoped_tls::ScopedKey<T>::set
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/macros/scoped_tls.rs:61:9
  68: tokio::runtime::thread_pool::worker::run
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/thread_pool/worker.rs:368:5
  69: tokio::runtime::thread_pool::worker::Launch::launch::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/thread_pool/worker.rs:347:45
  70: <tokio::runtime::blocking::task::BlockingTask<T> as core::future::future::Future>::poll
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/blocking/task.rs:42:21
  71: tokio::runtime::task::core::CoreStage<T>::poll::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/core.rs:161:17
  72: tokio::loom::std::unsafe_cell::UnsafeCell<T>::with_mut
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/loom/std/unsafe_cell.rs:14:9
  73: tokio::runtime::task::core::CoreStage<T>::poll
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/core.rs:151:13
  74: tokio::runtime::task::harness::poll_future::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:461:19
  75: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/core/src/panic/unwind_safe.rs:271:9
  76: std::panicking::try::do_call
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:406:40
  77: <unknown>
  78: std::panicking::try
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panicking.rs:370:19
  79: std::panic::catch_unwind
             at /rustc/65c55bf931a55e6b1e5ed14ad8623814a7386424/library/std/src/panic.rs:133:14
  80: tokio::runtime::task::harness::poll_future
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:449:18
  81: tokio::runtime::task::harness::Harness<T,S>::poll_inner
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:98:27
  82: tokio::runtime::task::harness::Harness<T,S>::poll
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/harness.rs:53:15
  83: tokio::runtime::task::raw::poll
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/raw.rs:113:5
  84: tokio::runtime::task::raw::RawTask::poll
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/raw.rs:70:18
  85: tokio::runtime::task::UnownedTask<S>::run
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/task/mod.rs:379:9
  86: tokio::runtime::blocking::pool::Inner::run
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/blocking/pool.rs:264:17
  87: tokio::runtime::blocking::pool::Spawner::spawn_thread::{{closure}}
             at /Users/aionescu/.cargo/registry/src/github.com-1ecc6299db9ec823/tokio-1.14.0/src/runtime/blocking/pool.rs:244:17
andrei-ionescu commented 2 years ago

I did run some more tests and ultimately I found the issue: two of the rows in the parquet file contains the 9999-12-31 02:00:00 in the dimension_load_date column.

This is supported by Parquet and Spark.

Here is the content of the parquet file:

+------------+------------------+------------------+-------------------+
|licence_code|vehicle_make      |fuel_type         |dimension_load_date|
+------------+------------------+------------------+-------------------+
|odc-odbl    |**Not Provided**  |**Not Provided**  |9999-12-31 02:00:00|
|odc-odbl    |**Not Applicable**|**Not Applicable**|9998-12-31 02:00:00|
|odc-odbl    |SAVIEM            |Petrol            |2021-06-09 03:02:37|
|odc-odbl    |YAMAHA            |Petrol            |2021-06-09 03:43:47|
|odc-odbl    |VAUXHALL          |Petrol            |2020-10-18 03:23:47|
|odc-odbl    |VAUXHALL          |Petrol            |2021-06-09 03:02:37|
|odc-odbl    |BMW               |Petrol            |2021-06-09 03:38:39|
|odc-odbl    |MG                |Petrol            |2020-10-18 03:23:47|
|odc-odbl    |PEUGEOT           |Diesel            |2020-10-18 03:35:16|
|odc-odbl    |FORD              |Diesel            |2020-10-18 03:23:47|
|odc-odbl    |FORD              |Petrol            |2020-10-18 03:12:55|
|odc-odbl    |SKODA             |Diesel            |2021-06-09 03:02:37|
|odc-odbl    |SHOGUN            |Diesel            |2020-10-18 03:12:55|
|odc-odbl    |MITSUBISHI        |Diesel            |2021-06-10 01:15:47|
+------------+------------------+------------------+-------------------+
andrei-ionescu commented 2 years ago

Should I close this ticket since I opened #1360 ticket for a better specificity?

andrei-ionescu commented 2 years ago

Closing issue, I created a more specific one here: #1360.