elixir-explorer / explorer

Series (one-dimensional) and dataframes (two-dimensional) for fast and elegant data exploration in Elixir
https://hexdocs.pm/explorer
MIT License
1.12k stars 123 forks source link

Default precision of 38 on Decimal dtype may be wrong #1016

Closed billylanchantin closed 1 week ago

billylanchantin commented 1 week ago

We can't currently represent decimals on the Rust-side where the Elixir-side's decimal.coef is outside the u64 range. This isn't really an issue in itself. But it makes the default precision of 38 over powered since the precision is the number of digits in base 10.

iex(1)> max_u64 = 2 ** 64 - 1
18446744073709551615

iex(2)> :math.log10(max_u64)
19.265919722494797

I think the confusion came because:

iex(3)> max_u128 = 2 ** 128 - 1
340282366920938463463374607431768211455

iex(4)> :math.log10(max_u128)
38.53183944498959

So if Rust were using an unsigned 128-bit integer, we could push the precision up to 38/39. But because it's using a signed 128-bit integer, we can really only go up to 19/20.

I think we can drop the default down to 20.

josevalim commented 1 week ago

I am not sure I get it. The bits allocated to the scale handles the negative parts, no? So we should get max 38 precision from it?

billylanchantin commented 1 week ago

I don't think so. The scale is just a number that represents how many places from the right you want to put the decimal point on the value.

Quick sanity check:

iex(1) max_u64 = 2 ** 64 - 1
18446744073709551615

iex(2)> biggest = Decimal.new(max_u64)
Decimal.new("18446744073709551615")

iex(3)> one_too_big = Decimal.new(max_u64 + 1)
Decimal.new("18446744073709551616")

iex(4)> Explorer.Series.from_list([biggest], dtype: {:decimal, 20, 0})
#Explorer.Series<
  Polars[1]
  decimal[20, 0] [18446744073709551615]
>

iex(5)> Explorer.Series.from_list([one_too_big], dtype: {:decimal, 20, 0})
** (RuntimeError) Generic Error: cannot decode a valid decimal from term; check that `coef` fits into an `i128`. error: throw(<term>)
    (explorer 0.11.0-dev) lib/explorer/polars_backend/shared.ex:18: Explorer.PolarsBackend.Shared.apply/2
    (explorer 0.11.0-dev) lib/explorer/polars_backend/series.ex:24: Explorer.PolarsBackend.Series.from_list/2
    iex:5: (file)
billylanchantin commented 1 week ago

Ok, I see what's going on here.

First, I'm wrong. The code in the above comment had me turned around. The signed vs. unsigned thing only loses 1 bit of precision, not half the bits 🤦🏻.

What's actually going on is our ExDecimal struct looks like this:

pub struct ExDecimal {
    pub sign: i8,
    pub coef: u64,
    pub exp: i64,
}

coef there should probably be i128.