duckdb / duckdb-rs

Ergonomic bindings to duckdb for Rust
MIT License
432 stars 87 forks source link

INTERVAL doesn't seem to be encoded properly in arrow #350

Open pshampanier opened 5 days ago

pshampanier commented 5 days ago

Running the following code:

use duckdb::arrow::datatypes::IntervalMonthDayNanoType;
use duckdb::{params, Connection};
use duckdb::arrow::array as arrow_array;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let conn = Connection::open_in_memory()?;
    let mut stmt = conn.prepare("SELECT '1 year 5 days 12 mins 13 seconds 8 microseconds'::INTERVAL")?;
    let arrow: duckdb::Arrow = stmt.query_arrow(params![])?;
    let vec_record_batch = arrow.collect::<Vec<_>>();

    println!("{:?}", vec_record_batch);

    let interval = vec_record_batch[0].column(0).as_any().downcast_ref::<arrow_array::IntervalMonthDayNanoArray>().unwrap().value(0);
    let (months, days, nanos) = IntervalMonthDayNanoType::to_parts(interval);

    println!("interval: {:?}", interval);
    println!("months: {}, days: {}, nanos: {}", months, days, nanos);

    Ok(())
}

I'm getting this output:

[
  RecordBatch { 
    schema: Schema 
    { fields: 
      [
        Field 
        { 
          name: "CAST('1 year 5 days 12 mins 13 seconds 8 microseconds' AS INTERVAL)", 
          data_type: Interval(MonthDayNano), 
          nullable: true, 
          dict_id: 0, 
          dict_is_ordered: false, 
          metadata: {} 
      }
    ], 
    metadata: {} 
  }, 
  columns: [
    PrimitiveArray<Interval(MonthDayNano)>
    [
    13521463553603053924225887764492,
    ]
  ], row_count: 1 
}]

interval: 13521463553603053924225887764492
months: 170, days: -1439399616, nanos: 21474836492

I'm using IntervalMonthDayNanoType::to_parts(interval) from arrow to decode the value so my guess is that the value is not properly encoded.

[!NOTE] rustc 1.75.0 (82e1608df 2023-12-21) duckdb = "0.10.2"

Mause commented 5 days ago

This was actually an issue in arrow-rs: https://github.com/duckdb/duckdb-wasm/issues/1696

And was fixed in https://github.com/apache/arrow-rs/issues/5654

It should be resolved in the next release