Closed nopria closed 11 months ago
fastnumbers
aims to return the same results that python returns, and this is the same result as python:
>>> int(3.453e78)
3453000000000000090829207640410620202184021047720495530415184879687805108748288
It would be nice adding an optional behaviour for fastnumbers
to allow overcoming such Python limitations.
Otherwise, and in the meantime, I'll have to use a workaround like the following one, which I guess is not very fast:
>>> from decimal import Decimal
>>> int(Decimal(str(3.453e78)))
3453000000000000000000000000000000000000000000000000000000000000000000000000000
But consider this:
>>> decimal.Decimal(3.453e78)
Decimal('3453000000000000090829207640410620202184021047720495530415184879687805108748288')
When you do e.g. a = 3.453e78
, that number is already 3453000000000000090829207640410620202184021047720495530415184879687805108748288.0
. So, when you do int(a)
, Python is accurately and correctly returning 3453000000000000090829207640410620202184021047720495530415184879687805108748288
.
The reason that Decimal(str(3.453e78))
gives what you expect is that Python has a heuristic for writing out string representations of floats that only gives what it anticipates is the user's desired human-readable output, and so the Decimal
class only sees the 3 digits you gave.
>>> str(3.453e78)
'3.453e+78'
>>> format(3.453e78, ".0f")
'3453000000000000090829207640410620202184021047720495530415184879687805108748288'
So, when you do fast_real(3.453e78)
, fastnumbers
is actually seeing 3453000000000000090829207640410620202184021047720495530415184879687805108748288.0
. For it to return 3453000000000000000000000000000000000000000000000000000000000000000000000000000
instead would mean that it has to guess that you only want to keep the first 4 digits. But, from fastnumbers
's perspective, fast_real(3.453e78)
is no different from fast_real(3453000000000000090829207640410620202184021047720495530415184879687805108748288.0)
, and in the latter case it seems like all those digits are important.
I simply don't see how this request can be satisfied as given.
Now, if the request were to make it so that fast_real("3.453e78")
returned 3453000000000000000000000000000000000000000000000000000000000000000000000000000
, that is possible and could be done.
Yes, making fast_real("3.453e78")
return
3453000000000000000000000000000000000000000000000000000000000000000000000000000
it's the first request and I think would be a nice improvement.
About the rest, I understand your explanation and I'm aware that fastnumbers
sees the numeric literal 3.453e78
as
3453000000000000090829207640410620202184021047720495530415184879687805108748288
but at the same time I guess (correct me if I'm wrong) that fastnumbers
is aware that the least significant bit of the numeric literal 3.453e78
is 4.113761393303015e+62
(given by math.ulp(3.453e78)
), so it could be aware that all the last 62 digits of
34530000000000000 90829207640410620202184021047720495530415184879687805108748288
which are precisely those after the added space, are arbitrary and therefore could be set to zero.
If this is the case, I think that an option to let fastnumbers
silence arbitrary digits (those beyond the least significant bit) would be a valuable feature.
I'll take a look, but I'm not really sure how one "silences" digits, especially in the python long type. I assume you have an idea?
After some tentatives I came up with the following pure numerical way (in Python), which handles rounding and negative numbers and delivers the same results as decimal.Decimal(str(n))
for the tests I made on big floats:
import math
n=3.453e78
n_sign = (n>0)-(n<0)
digits = 1+int(math.log10(math.ulp(abs(n))))
int_n = int(abs(n))
roundup = 1 if int_n // 10**digits % 10 == 9 else 0 # 1 if least decimal significant digit is 9, else 0
fixed_n = (roundup + int_n // (10 ** digits)) * (10 ** digits) * n_sign
gives
>>> n = 3.453e78
>>> int(n)
3453000000000000090829207640410620202184021047720495530415184879687805108748288
>>> fixed_n
3453000000000000000000000000000000000000000000000000000000000000000000000000000
>>> n = -6.853808e38
>>> int(n)
-685380799999999969504234500079333933056
>>> fixed_n
-685380800000000000000000000000000000000
I've been working on this off and on for the past few months. I have come to the conclusion that this is possible, but it won't be very fast (at least not compared to all the other operations this library does).
The problem is that these "arbitrary digits" only occur for very large numbers, much larger than will fit into a 64-bit integer. That means that under-the-hood, they must be represented in Python's arbitrary-sized integer representation, and as a result raw C++ arithmetic cannot be performed on them - all math must be through Python's C API. Doing this is pretty much as slow as Python.
I tried a hybrid approach where I used C++ types and arithmetic to calculate the number of digits that were "arbitrary", then use Python's round
to set them to zero... this was both very slow sometimes does not return the same result as Decimal
due to rounding of the internal representation of double
that is unrecoverable. So, "post-processing" the result after conversion to integer to remove the digits is not the correct method.
I am going to investigate a method to convert from a string representation to get the results you request. I am imagining an algorithm something like this:
This should result in an integer that gives identically the same result as Decimal
is giving, and should be pretty fast - at least faster that what I have tried so far.
In order to support floats (instead of strings), I will have to first convert the float to a string using something analogous to str(float)
, then use the above algorithm. This of course is not ideal because it requires a memory allocation and lots of extra work, so will not be overly fast, but it will be more consistent with the spirit of the request.
I'll do some bench-marking when implemented, and think about default behaviors, etc.
try_real
and try_forceint
now have the denoise
options to enable this. See the README for more details on the behavior and caveats, and the profiling data to see how it compares speedwise.
Nice job Seth. To make easier for readers to know about the denoise
option, in the documentation at "See README for more details" it would be convenient to have a link to the relevant README section (https://github.com/SethMMorton/fastnumbers/blob/main/README.rst#about-the-denoise-option).
Many people might be interested in this new option, which in Python is unique, I guess.
Describe the bug
returns
instead of
Environment (please complete the following information):
Observations
I know that the current behaviour is due to unavoidable numeric representation error, however the result is somewhat "disturbing". IMHO the ideal behaviour should be to round the result to the digit before the least significant bit. It's not a surprise that
gives
which tells us that the last 62 digits of the integer output of
fast_real(3.453e78)
are, so to say, arbitrary and therefore misleading.