Closed declanvk closed 8 months ago
This is much needed, thanks. I think started that at some point, but I can't find the commits 😬 .
Like the other traits, I'd like to break this out into an impl_fmt.rs
file. So I'll do that in trunk and rebase your branch onto that to keep accurate attribution.
I'll break your function-scope format_full_scale
and format_exponential
out into module-scope, so they may be individually, without dependency on EXPONENTIAL_FORMAT_THRESHOLD
.
Regarding EXPONENTIAL_FORMAT_THRESHOLD
, I think I'll make that a compile-time configurable; like RUST_BIGDECIMAL_DEFAULT_PRECISION
.
Most people wont care, but someone might find it useful.
It looks like Python's default decimal formatter only keeps 5 leading zeros before going to exponential form:
>>> Decimal("1e-6")
Decimal('0.000001')
>>> Decimal("1e-7")
Decimal('1E-7')
>>> Decimal("11e-7")
Decimal('0.0000011')
>>> Decimal("11e-8")
Decimal('1.1E-7')
And trailing zeros are precision-dependent, which makes sense as those could indicate accuracy:
>>> Decimal("1e0")
Decimal('1')
>>> Decimal("1e1")
Decimal('1E+1')
>>> Decimal("1e3")
Decimal('1E+3')
>>> Decimal("10e1")
Decimal('1.0E+2')
>>> Decimal("10000e3")
Decimal('1.0000E+7')
__str__
follows same rules:
>>> str(Decimal("1e-6"))
'0.000001'
>>> str(Decimal("1e-7"))
'1E-7'
>>> str(Decimal("100e7"))
'1.00E+9'
I found my feature/impl-fmt branch. I rebased that onto trunk and I'll get your changes in there somewhere.
I applied your changes to the split file instead of src/lib.rs
.
Only significant change was moving the format_exponential
out of the fmt method, and leaving the other code in the function as-is (i.e. jump to format_exponential if length condition holds, otherwise use the old code)
I may change this back to your two-function approach, and move the whole thing into yet another function that parameterizes the threshold-value, so we can write nicer tests.
One thing to note: I set EXPONENTIAL_FORMAT_THRESHOLD = 2
(small value) and the exponential form is chosen when given a value like 7000:
println!("{}", BigDecimal::from(7000)); -> "7.000E3"
Is that expected behavior?
Regarding EXPONENTIAL_FORMAT_THRESHOLD, I think I'll make that a compile-time configurable; like RUST_BIGDECIMAL_DEFAULT_PRECISION.
Sounds good to me! That is a cool technique you're using in the build.rs
Is that expected behavior?
This makes sense, but I did not anticipate it ahead of time. Sinceabs_int = "7000"
and scale = 0
, then we get "7000".len() - 0 > 2
which evaluates to true and goes to the exponential path.
I think this points out a weakness in the condition, I previously assumed that the trailing zeroes would not have this affect.
What is your preference on this? Should I make it count the abs_int.len() - num_trailing_zeroes
?
The special case scale == 0
always means the BigDecimal is an integer, and the BigInt may be printed verbatim. So that's one.
Comparing to other libraries:
// Java
import java.math.*;
class BigDecimalTest {
public static void main(String[] args) {
String[] strs = new String[]{
"0.123",
"0.000001",
"0.0000001",
"100",
"100e3",
"1000000000",
"1000000000.00000",
"1000000000e1",
};
for (String s : strs) {
BigDecimal n = new BigDecimal(s);
System.out.println(s + " -> " + n);
}
}
}
outputs
0.123 -> 0.123
0.000001 -> 0.000001
0.0000001 -> 1E-7
100 -> 100
100e3 -> 1.00E+5
1000000000 -> 1000000000
1000000000.00000 -> 1000000000.00000
1000000000e1 -> 1.000000000E+10
# Python
from decimal import Decimal
strs = [
"0.123",
"0.000001",
"0.0000001",
"100",
"100e3",
"1000000000",
"1000000000.00000",
"1000000000e1",
]
for s in strs:
n = Decimal(s)
print(f"{s} -> {n}")
0.123 -> 0.123
0.000001 -> 0.000001
0.0000001 -> 1E-7
100 -> 100
100e3 -> 1.00E+5
1000000000 -> 1000000000
1000000000.00000 -> 1000000000.00000
1000000000e1 -> 1.000000000E+10
(good, it's the same)
Ruby:
require 'bigdecimal'
strs = [
"0.123",
"0.000001",
"0.0000001",
"100",
"100e3",
"1000000000",
"1000000000.00000",
"1000000000e1",
]
for s in strs do
n = BigDecimal s
print(s, " -> ", n, "\n")
end
0.123 -> 0.123e0
0.000001 -> 0.1e-5
0.0000001 -> 0.1e-6
100 -> 0.1e3
100e3 -> 0.1e6
1000000000 -> 0.1e10
1000000000.00000 -> 0.1e10
1000000000e1 -> 0.1e11
...... well I regret trying that one. They ALL start with 0.
!?
0.1 -> 0.1
0.10 -> 0.10
0.100 -> 0.100
0.1000 -> 0.1000
...
0.100000000 -> 0.100000000
...
0.100000000000000000000000000 -> 0.100000000000000000000000000
So trailing zeros don't matter.
I think the rule is
// compare scale to zero
match self.scale.cmp(&0) {
// simply print integer if scale is zero
Equal => print(abs_int),
// decimal point is "to the right" of the end of the integer
// always print exponential form
// (example 100e3 => 100xxx. )
Less => print_exponential_form(...),
// greater means decimal point is to the left of the end of the integer
// we should do exponential form if the decimal point is past the left of
// the start (significant) digit
// 123e-2 => 1.23 (scale - len = 2 - 3 = -1)
// 123e-5 => 0.00123 (scale - len = 5 - 3 = 2)
// 123e-9 => 0.000000123 (scale - len = 9 - 3 = 6 > threshold => 1.23e-7)
_ => if self.scale - abs_int.len() as i64 > THRESHOLD {
print_exponential_form(...)
} else {
print_full_form(...)
}
}
Ok: We have decimal with digits
dddddddddd
Only thing that determines exponential is placement of the decimal point. Value or number of digits do not matter:
dddddddddd0…00. <- exponential
dddddddddd. <- full (no decimal point to be printed)
ddddd.dddd <- full
.dddddddddd <- full
.0000dddddddddd <- full
.0000…0000dddddddddd <- exponential if n > threshold
|-- n --|
So I'm going with
// check if scale is outside of "boundary"
if scale < 0 || (scale as usize > abs_int.len() + threshold) {
exponential
} else {
full
}
These tests are passing for threshold = 2 👏
impl_case!(case_0d123: "0.123" => "0.123");
impl_case!(case_0d0123: "0.0123" => "0.0123");
impl_case!(case_0d00123: "0.00123" => "0.00123");
impl_case!(case_0d000123: "0.000123" => "1.23E-4");
impl_case!(case_123d: "123." => "123");
impl_case!(case_123de1: "123.e1" => "1.23E3");
test impl_fmt::test_display::case_0d000123 ... ok
test impl_fmt::test_display::case_123d ... ok
test impl_fmt::test_display::case_0d00123 ... ok
test impl_fmt::test_display::case_123de1 ... ok
test impl_fmt::test_display::case_0d123 ... ok
test impl_fmt::test_display::case_0d0123 ... ok
the last one there is almost consistent with Python. Do we force the +
?
>>> Decimal("123.e1")
Decimal('1.23E+3')
Nice work on adjusting the rule! I like the latest iteration, focusing in on the placement of the digits really brings it into focus
the last one there is almost consistent with Python. Do we force the +?
Yeah I think go for it! Looking at the python and java examples, it seems like they always include the sign on the exponent
I left some nonblocking comments (on my own PR hahaha), let me know what you think! I can go ahead and implement anything, I feel bad making you do all this work!
Thanks again for looking at this
I didn't mean to force push onto your branch, but if you like the changes I'm glad I did. Thanks for your comments, they look good. I'll rebase again to correct the last old-rust compatibility issue.
I removed the updated threshold check, I'm working on redoing that commit to also update the tests.
I'll look at all this again tonight.
I didn't mean to force push onto your branch, but if you like the changes I'm glad I did.
I'm glad you did! The changes you made look good
I'm about ready to call this done. I wrote a little code-writing python script to generate unit-test mods so the Rust bigdecimal formatting of {:XYZ}
aligns with the same format in Python.
This was merged and released last night in 0.4.3.
I forgot to use the GitHub ui, oops. To make it look merged I'm going to merge it now into trunk and then force push updates away.
Thanks for all your work on this @akubera !
'twas a long time coming. Let me know right away if there's any problems!
This PR is related to some of the comments in #108, but there is not a specific issue created separately for this. Please let me know if you want me to create an issue first for some discussion.
The first commit of the PR is mostly just adding tests to capture the current behavior.
The second commit is the more important, it adds a different formatting code path to the
fmt::Display
impl, so that numbers that have a large (absolute) scale relative to their integer representation length will be displayed in exponential format. The second commit message has more detail.I chose the
(abs_int.len() as i64 - self.scale - 1)
as the criteria for the cutoff because I didn't want it to trigger only based on thescale
. I imagined a scenario where the scale is large (either negative or positive), but it is inflated because theint_val
has a large number of digits.I'm specifically targeting cases where someone could input a small string like
1E100000
and then crash a program by making it do a huge allocation.Thanks for making and maintaining the
bigdecimal
crate! We find it very useful