Closed liukun4515 closed 1 year ago
please assign this issue to me. Thanks
unsigned numeric to decimal @alamb do we need to support this? My opinion is that unsigned data type is not in the standard SQL data type, and we no need to support this.
@liukun4515 I personally think it would be valuable to support unsigned
numeric (aka PrimitiveArray<T>
) to / from the decimal type. Arrow is used for more than just SQL engines. However, I think it is fine to leave it as a follow on PR / issue (doesn't have to be in the first implementation).
I suspect once you implement PrimitiveArray<T>
to/from DecimalArray
for signed variants the incremental cost to supporting the unsigned numeric variants will be quite low.
Ok, I will implement signed numeric first, and leave other as the follow-up PR.
On commit ab48e69099f2d7c4178 building with nightly-2021-12-05-aarch64-apple-darwin I am getting:
---- record::api::tests::test_convert_double_to_string stdout ----
thread 'record::api::tests::test_convert_double_to_string' panicked at 'assertion failed: `(left == right)`
left: `"1e-15"`,
right: `"0.000000000000001"`', parquet/src/record/api.rs:1097:9
---- record::api::tests::test_convert_float_to_string stdout ----
thread 'record::api::tests::test_convert_float_to_string' panicked at 'assertion failed: `(left == right)`
left: `"1e-15"`,
right: `"0.000000000000001"`', parquet/src/record/api.rs:1084:9
The string formatter is:
Where are the requirements for this?
Where are the requirements for this?
I do not know @chadbrewbaker
Some code archeology in https://github.com/apache/arrow-rs/blame/ab48e69099f2d7c4178a75f7e1cf8b223ecf8d76/parquet/src/record/api.rs#L717 suggests that there has been a special case for checking that value for quite a while
Given that 0.000000000000001
is right on the boundary, perhaps we need to adjust the code to reflect differences in floating point on aarch64
?
Same errors on x86.
This seems to be the last commit that changed the rounding.
https://github.com/apache/arrow-rs/commit/21cb780808407fc9fca3d028106181c5c9ae54ee
@chadbrewbaker On a mac, the tests pass for me using cargo test -p parquet
(using stable)
$ cargo --version
cargo 1.57.0 (b2e52d7ca 2021-10-21)
I also get the diff using nightly:
(arrow_dev) alamb@MacBook-Pro-2:~/Software/arrow-rs$ cargo +nightly --version
cargo 1.58.0-nightly (294967c53 2021-11-29)
If you want to propose a patch to allow the tests to pass on both nightly and stable that would be 👍
Bah - my stupidity not running stable. I'll diff what they changed in nightly. Seems to be cross-platform.
I'll try to upstream our test for the "1e-15" boundary. Doesn't seem to be one now.
TIL
cargo bisect-rustc --start 2021-10-01 -- cargo test test_convert_double_to_string
I bisected the 1e-15 conversion regression down to this changeset between nightlies.
https://github.com/rust-lang/rust/compare/1af55d19c...efd048394
I bisected the 1e-15 conversion regression down to this change-set between nightlies.
From your range of commit you can see that there is a commit named Automatic exponential formatting in Debug that added an internal logic that switch to the exponential representation in some cases when using the Debug
formatter.
Rust make no promises about Debug
formatting and can change it's output anytime for any reason but this is not the case for the Display
formatter which has guaranties for how it display thing.
I would suggest changing your formatting to the Display
(ie write!(f, "{:?}", value)
-> write!(f, "{}", value)
) formatter.
@alamb we need to speed up reviewing these two pull requests https://github.com/apache/arrow-rs/pull/1044 https://github.com/apache/arrow-rs/pull/1073, and they can be merged into the release of arrow-rs 6.5. They will be used in the datafusion for comparison operation and mathematics operation for decimal type.
@liukun4515 I will do so today
There are bugs for cast decimal to decimal
.
If we cast decimal(1256,12,2) to the type of decimal(12,1), the result should be decimal(126,12,1) not the decimal(125,12,1)
👍 thanks @liukun4515 -- maybe it is worth its own ticket to talk about the rounding vs truncation behavior of decimals while casting
Nice! I will look at the commit. For ML workloads, also going to need a version that stochastically flips the last bit. https://twitter.com/id_aa_carmack/status/1296295284994183168
Possibly related https://github.com/apache/arrow-rs/pull/3281
Is your feature request related to a problem or challenge? Please describe what you are trying to do. In the https://github.com/apache/arrow-datafusion/issues/122#issuecomment-975036335, we need to support decimal for
cast
andtry_cast
In the first step, we just support signed numeric data type to decimal, and other casting rules related to decimal will be supported in the feature.
future
enhancement
[x] optimize iter the decimal array and use the from iter to construct Int、Float Array https://github.com/apache/arrow-rs/issues/1083
[x] add benchmark result for cast decimal data type
Describe the solution you'd like A clear and concise description of what you want to happen.
Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.
Additional context Add any other context or screenshots about the feature request here.