apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.57k stars 779 forks source link

support cast for decimal data type #1043

Closed liukun4515 closed 1 year ago

liukun4515 commented 2 years ago

Is your feature request related to a problem or challenge? Please describe what you are trying to do. In the https://github.com/apache/arrow-datafusion/issues/122#issuecomment-975036335, we need to support decimal for cast and try_cast

In the first step, we just support signed numeric data type to decimal, and other casting rules related to decimal will be supported in the feature.

future

enhancement

Describe the solution you'd like A clear and concise description of what you want to happen.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Add any other context or screenshots about the feature request here.

liukun4515 commented 2 years ago

please assign this issue to me. Thanks

alamb commented 2 years ago

unsigned numeric to decimal @alamb do we need to support this? My opinion is that unsigned data type is not in the standard SQL data type, and we no need to support this.

@liukun4515 I personally think it would be valuable to support unsigned numeric (aka PrimitiveArray<T>) to / from the decimal type. Arrow is used for more than just SQL engines. However, I think it is fine to leave it as a follow on PR / issue (doesn't have to be in the first implementation).

I suspect once you implement PrimitiveArray<T> to/from DecimalArray for signed variants the incremental cost to supporting the unsigned numeric variants will be quite low.

liukun4515 commented 2 years ago

Ok, I will implement signed numeric first, and leave other as the follow-up PR.

chadbrewbaker commented 2 years ago

On commit ab48e69099f2d7c4178 building with nightly-2021-12-05-aarch64-apple-darwin I am getting:

---- record::api::tests::test_convert_double_to_string stdout ----
thread 'record::api::tests::test_convert_double_to_string' panicked at 'assertion failed: `(left == right)`
  left: `"1e-15"`,
 right: `"0.000000000000001"`', parquet/src/record/api.rs:1097:9

---- record::api::tests::test_convert_float_to_string stdout ----
thread 'record::api::tests::test_convert_float_to_string' panicked at 'assertion failed: `(left == right)`
  left: `"1e-15"`,
 right: `"0.000000000000001"`', parquet/src/record/api.rs:1084:9

The string formatter is:

https://github.com/apache/arrow-rs/blob/ab48e69099f2d7c4178a75f7e1cf8b223ecf8d76/parquet/src/record/api.rs#L716-L728

Where are the requirements for this?

alamb commented 2 years ago

Where are the requirements for this?

I do not know @chadbrewbaker

Some code archeology in https://github.com/apache/arrow-rs/blame/ab48e69099f2d7c4178a75f7e1cf8b223ecf8d76/parquet/src/record/api.rs#L717 suggests that there has been a special case for checking that value for quite a while

Given that 0.000000000000001 is right on the boundary, perhaps we need to adjust the code to reflect differences in floating point on aarch64?

chadbrewbaker commented 2 years ago

Same errors on x86.

This seems to be the last commit that changed the rounding.

https://github.com/apache/arrow-rs/commit/21cb780808407fc9fca3d028106181c5c9ae54ee

alamb commented 2 years ago

@chadbrewbaker On a mac, the tests pass for me using cargo test -p parquet (using stable)

$ cargo --version
cargo 1.57.0 (b2e52d7ca 2021-10-21)

I also get the diff using nightly:

(arrow_dev) alamb@MacBook-Pro-2:~/Software/arrow-rs$ cargo +nightly --version
cargo 1.58.0-nightly (294967c53 2021-11-29)

If you want to propose a patch to allow the tests to pass on both nightly and stable that would be 👍

chadbrewbaker commented 2 years ago

Bah - my stupidity not running stable. I'll diff what they changed in nightly. Seems to be cross-platform.

chadbrewbaker commented 2 years ago

I'll try to upstream our test for the "1e-15" boundary. Doesn't seem to be one now.

https://github.com/rust-lang/rust/blob/c5ecc157043ba413568b09292001a4a74b541a4e/library/alloc/tests/fmt.rs#L137-L155

chadbrewbaker commented 2 years ago

TIL

cargo bisect-rustc --start 2021-10-01  -- cargo test test_convert_double_to_string
chadbrewbaker commented 2 years ago

I bisected the 1e-15 conversion regression down to this changeset between nightlies.

https://github.com/rust-lang/rust/compare/1af55d19c...efd048394

Urgau commented 2 years ago

I bisected the 1e-15 conversion regression down to this change-set between nightlies.

rust-lang/rust@1af55d1...efd0483

From your range of commit you can see that there is a commit named Automatic exponential formatting in Debug that added an internal logic that switch to the exponential representation in some cases when using the Debug formatter. Rust make no promises about Debug formatting and can change it's output anytime for any reason but this is not the case for the Display formatter which has guaranties for how it display thing.

I would suggest changing your formatting to the Display (ie write!(f, "{:?}", value) -> write!(f, "{}", value)) formatter.

liukun4515 commented 2 years ago

@alamb we need to speed up reviewing these two pull requests https://github.com/apache/arrow-rs/pull/1044 https://github.com/apache/arrow-rs/pull/1073, and they can be merged into the release of arrow-rs 6.5. They will be used in the datafusion for comparison operation and mathematics operation for decimal type.

alamb commented 2 years ago

@liukun4515 I will do so today

liukun4515 commented 1 year ago

There are bugs for cast decimal to decimal.

If we cast decimal(1256,12,2) to the type of decimal(12,1), the result should be decimal(126,12,1) not the decimal(125,12,1)

alamb commented 1 year ago

👍 thanks @liukun4515 -- maybe it is worth its own ticket to talk about the rounding vs truncation behavior of decimals while casting

chadbrewbaker commented 1 year ago

Nice! I will look at the commit. For ML workloads, also going to need a version that stochastically flips the last bit. https://twitter.com/id_aa_carmack/status/1296295284994183168

alamb commented 1 year ago

Possibly related https://github.com/apache/arrow-rs/pull/3281