Closed timsaucer closed 2 weeks ago
It's probably related to this issue in arrow-rs: Rust Interval definition is incorrect.
Here's a godbolt link demonstrating the "1 month becomes 1 nanosecond" example. (I based that on a comment in a similar thread in duckdb-wasm).
I would suspect that if all code paths use the same impl, then datafusion-python
wouldn't notice it, but perhaps that's wrong, or maybe not all code-paths use arrow-rs?
The error occurs in the pyo3 magic as we cross the python -> rust
bridge. Notice the python side assert's pyarrow.Scalar
, but then the rust-side receives a datafusion::ScalarValue
. (aside: is this magic type conversion intentional?)
The error is already present before the rust-method is invoked, adding print statements on both sides of the bridge:
converting: MonthDayNano(months=1, days=0, nanoseconds=0)
converting: IntervalMonthDayNano("1")
TODO: If the PR https://github.com/apache/datafusion-python/pull/666 merges in before this issues is corrected, the following examples in the examples/tpch
folder will need to be updated
Describe the bug When creating a literal interval value from a pyarrow scalar, the values for month, day, and nanoseconds are not correctly assigned in the literal values. The following minimal example will reproduce. This appears to be limited to
datafusion-python
and not the rust implementation.To Reproduce
Produces the following result:
Expected behavior When setting an interval value of 1 month in pyarrow, it should show up as 1 month in the datafusion data frame, and so on for the other values.
Additional context Add any other context about the problem here.