JuliaLang / julia

The Julia Programming Language
https://julialang.org/
MIT License
45.63k stars 5.48k forks source link

BigInt and BigFloat could work more alike #45760

Open JeffreySarnoff opened 2 years ago

JeffreySarnoff commented 2 years ago

It seems asymmetric that

abigint = 123456789_123456789_123456789_123456789_123456789
typeof(abigint) == BigInt

abigfloat = 123456789_123456789_123456789_123456789.123456789_12345678
typeof(abigfloat) == Float64

BigInt("") does not do anything, however to write a BigFloat constant into a source file, the type and a string are needed

there is much more efficient way to remove or to ignore the '_'s -- its not something I have had to do

removeunderscore(s::AbstractString) = join(split(s, '_'))

BigInt(s::AbstractString) = parse(BigInt,  removeunderscore(s))
BigFloat(s::AbstractString) = parse(BigFloat, removeunderscore(s))
KristofferC commented 2 years ago

What does _ have to do with anything here? Int literals with large values parse as BigInt but floats to not.

julia> 123456789_123456789_123456789_123456789.123456789_12345678 |> typeof
Float64

julia> 123456789123456789123456789123456789.12345678912345678 |> typeof
Float64
Seelengrab commented 2 years ago

There's also the big"" string macro, which does handle both of these:

julia> big"123456789_123456789_123456789_123456789_123456789"
123456789123456789123456789123456789123456789

julia> ans |> typeof
BigInt

julia> big"123456789_123456789_123456789_123456789.123456789_12345678"
1.234567891234567891234567891234567891234567891234567799999999999999999999999999e+35

julia> ans |> typeof
BigFloat
JeffreySarnoff commented 2 years ago

"Int literals with many digits parse as BigInt but float literals with many digits do not."

It would be simpler for users if both int and float literals with many digits parsed accordingly I still do see the advantage of letting BigFloat(str) yet not letting BigInt(str) function as a constructor.

Early in the design of Julia, support for interposing neutral '_'s within digit sequences, to make long numbers easier to read and simpler to comparison check,
was discussed, decided, adopted and provided.

Today, this capability exists throughout most of the ways of using Julia. The example shows an uncovered situation.
Wherever an extended precision number that intersperses underscores is become non-workable, that's a place that ought be made workable. That there are alternatives e.g big"_" seems more a band-aid than an elegant solution to people who use Julia and regularly work with values specified by 100 digits or more.

@Seelengrab yes -- I had forgotten that big also is a string macro

KristofferC commented 2 years ago

Today, this capability exists throughout most of the ways of using Julia. The example shows an uncovered situation.

I still don't understand. What is the "uncovered situation"? Your example is:

abigint = 123456789_123456789_123456789_123456789_123456789
typeof(abigint) == BigInt

abigfloat = 123456789_123456789_123456789_123456789.123456789_12345678
typeof(abigfloat) == Float64

but this is true even if you remove the _. So what change does _ make to anything? Please speak in as simple language as you can because someone as me who is a non-native speaker might have difficulties understanding otherwise :)

sostock commented 2 years ago

What is the "uncovered situation"?

If I understand correctly, the "uncovered situation" regarding underscores is that they are only allowed in literals, not in strings that are parsed to a number. However, this is not specific to BigFloats:

julia> parse(BigFloat, "123.456_789")
ERROR: ArgumentError: cannot parse "123.456_789" as BigFloat
[...]

julia> parse(Float64, "123.456_789")
ERROR: ArgumentError: cannot parse "123.456_789" as Float64
[...]

julia> parse(Int, "123_456")
ERROR: ArgumentError: invalid base 10 digit '_' in "123_456"
[...]

The other part of the issue (the "asymmetry") is that all floating-point literals create Float64s, even if they have a lot of digits, whereas integer literals with lots of digits create BigInts. However, I don’t think it makes much sense to parse floating-point literals with lots of digits as BigFloats. The situations for integers and floats are simply different:

Creating a BigFloat if a floating-point literal contains more than a certain number of digits seems arbitrary. At what number of digits do you switch to BigFloat? As the number 0.1 demonstrates, even for a literal with just one decimal place, a BigFloat will be closer to the exact value than a Float64. Should the literal 0.1 therefore create a BigFloat? I don’t think so.

JeffreySarnoff commented 2 years ago

@sostock Thank you for explaining and then detailing my notes. @KristofferC You communicate so well in English that I forgot.

There is some more to this -- I need to write code :)

KristofferC commented 2 years ago

So if it is like @sostock says that the issue is with _ in parse vs literals then I don't understand the title. The ability of being able to use _ in literals but not in parse is true for all integer types and is the same for BigInt and BigFloat so how does the title saying claiming that they are "not symmetric" make sense?

elextr commented 2 years ago

@KristofferC IIUC the "asymmetry" is two part:

  1. numbers with lots of digits generate Bigints but not Bigfloats
  2. therefore Bigfloats need to be generated by parsing strings which does not support _s, so Bigfloats can't be generated with _s but Bigints can because they can be generated with literals

But as @Seelengrab pointed out, the big"" macro handles both types with underscores giving a workaround to the second asymmetry.

As @sostock pointed out, curing the first asymmetry is "complicated" like all floating point, and is best handled explicitly using big"".