aldanor / fast-float-rust

Super-fast float parser in Rust (now part of Rust core)
https://docs.rs/fast-float
Apache License 2.0
275 stars 20 forks source link

Discussion: How is POWERS_OF_FIVE_128 generated? #12

Closed Protowalker closed 3 years ago

Protowalker commented 3 years ago

How is this table generated? My first thought was that these are 5^x, but log5(0xhilo) returns a non-whole number.

aldanor commented 3 years ago

https://github.com/fastfloat/fast_float/blob/main/include/fast_float/fast_table.h#L7

Maybe I could add an inline #[test] to verify its correctness.

lemire commented 3 years ago

A manuscript is going to be posted online in the near future (within a few days) which explains how they are generated in details.

Related https://github.com/lemire/fast_double_parser/issues/19

Protowalker commented 3 years ago

My thought was to write a const fn that will generate this table. My reasoning is that by doing so you could apply the same bitmask trick from PR #7 without having to manually fill the table with 650 (0,0) pairs, but it would also just improve the readability of the code since it would no longer take up 700 lines.

aldanor commented 3 years ago

I think in theory we could use lazy_static or once_cell or the such, but then that would induce dependencies (once_cell is considered for adding it to std, but who knows when that happens).

Yet another option would be to generate the table in the build script but then that brings in build scripts. Maybe it's ok though, idk.

Note that in std they also have a 6KB table: https://github.com/rust-lang/rust/blob/master/library/core/src/num/dec2flt/table.rs (and a Python script alongside it that generates it - that's another way to do it I guess).

Protowalker commented 3 years ago

Would there be any issue with just making a const fn that generates the table?

aldanor commented 3 years ago

I'll check out if it works.

Protowalker commented 3 years ago

After a quick look, it seems that the only thing that might be a limiter is that i64::pow and u64::pow are not currently stable for use in const fn.

aldanor commented 3 years ago

Not only (that you can do easily yourself via loops?); you need bigints there, I think (e.g. 5 ^ 342) - so you'll have to implement your own const bigint multiplication and division.

aldanor commented 3 years ago

@lemire I've written a test (using Rust's num-bigint) for the table based on the Python code you've shared above and noticed that some numbers don't match – specifically low components for exponents from -27 to 18:

-27: 9e74d1b791e07e48 775ea264cf55347d (expected: 9e74d1b791e07e48, 775ea264cf55347e)
-26: c612062576589dda 95364afe032a819d (expected: c612062576589dda, 95364afe032a819e)
-25: f79687aed3eec551 3a83ddbd83f52204 (expected: f79687aed3eec551, 3a83ddbd83f52205)
-24: 9abe14cd44753b52 c4926a9672793542 (expected: 9abe14cd44753b52, c4926a9672793543)
-23: c16d9a0095928a27 75b7053c0f178293 (expected: c16d9a0095928a27, 75b7053c0f178294)
-22: f1c90080baf72cb1 5324c68b12dd6338 (expected: f1c90080baf72cb1, 5324c68b12dd6339)
-21: 971da05074da7bee d3f6fc16ebca5e03 (expected: 971da05074da7bee, d3f6fc16ebca5e04)
-20: bce5086492111aea 88f4bb1ca6bcf584 (expected: bce5086492111aea, 88f4bb1ca6bcf585)
-19: ec1e4a7db69561a5 2b31e9e3d06c32e5 (expected: ec1e4a7db69561a5, 2b31e9e3d06c32e6)
-18: 9392ee8e921d5d07 3aff322e62439fcf (expected: 9392ee8e921d5d07, 3aff322e62439fd0)

The expected values are the ones hardcoded in the C++ header that I've copied straight to Rust.

So then I've tried running your Python code for those exponents and got:

0x9e74d1b791e07e48,0x775ea264cf55347d,
0xc612062576589dda,0x95364afe032a819d,
0xf79687aed3eec551,0x3a83ddbd83f52204,
0x9abe14cd44753b52,0xc4926a9672793542,
0xc16d9a0095928a27,0x75b7053c0f178293,
0xf1c90080baf72cb1,0x5324c68b12dd6338,
0x971da05074da7bee,0xd3f6fc16ebca5e03,
0xbce5086492111aea,0x88f4bb1ca6bcf584,
0xec1e4a7db69561a5,0x2b31e9e3d06c32e5,
0x9392ee8e921d5d07,0x3aff322e62439fcf,

Which, again, matches my test but doesn't match what's in the headers.

Where's the truth? :)

lemire commented 3 years ago

I did not share scripts above, I linked to a related issue in another repository.

aldanor commented 3 years ago

I mean this one: https://github.com/lemire/fast_double_parser/blob/master/script/table_generation.py

lemire commented 3 years ago

Yes, fast_double_parser uses a different algorithm.

lemire commented 3 years ago

It is closely related, but not identical.

lemire commented 3 years ago

In particular, fast_double_parser does not have to handle ties to even.

aldanor commented 3 years ago

Oh, ok then - apologies, that's my misunderstanding. I'll check this vs the generation script included in the appendix in the pdf.

aldanor commented 3 years ago

Confirmed, all checks pass when using the algorithm in the latest pdf as a reference (will leave them in the crate for documenting purpose).

lemire commented 3 years ago

Note that the scripts are available at https://github.com/fastfloat/fast_float/blob/main/script/table_generation.py

aldanor commented 3 years ago

👍

For reference: https://github.com/aldanor/fast-float-rust/blob/9ee8496565393a191fd5762ae73882405be18170/src/table.rs#L11-L47