Open asterite opened 5 years ago
A part of this RFC includes an older one https://github.com/crystal-lang/crystal/issues/6626, about making integer type depend on the platform.
I'm perfectly happy with the end result of this change, but I wonder how best to stage this change into the language. It doesn't seem like there's a way to incrementally apply this, the only way is to have a single release break all existing programs and libraries.
Which I'm fine with, since there doesn't seem to be an alternative.
Yeah... it's even hard to develop because Int
and Float
are baked in the compiler so we'll first have to change their meaning, then compile a compiler with the existing primitives.cr
file, but then change that file to define the new hierarchy (and use Int
everywhere), and then compile the final new compiler.
In any case I think this can be delayed to the future, after we get parallelism and windows. But it's something I would definitely like to have before 1.0 because it's a big change.
It's curious that I'm also repeating myself (#6626) but I'm glad what I wrote here is what we ended up concluding there (though I don't know why I said it's impossible to do so).
I'll happily welcome the change. I grew to really dislike Int
being the union of all signed integers, and wished it just some integer (32 or 64-bit or arch-dependent). It will break some programs, thought maybe not that much, and it can be quickly fixed by temporarily using an AnyInt
alias or something.
Swift also has distinct Int
and UInt
types that are architecture dependent, and are the recommended and default integer types (https://docs.swift.org/swift-book/LanguageGuide/TheBasics.html#ID317). Same for Nim with int
and uint
. Even C/C++ have long
and ulong
.
Yet, I can't find a language with architecture dependent floats. Swift has Float
and Double
, Go as float32
and float64
. Nim has a float
type that used to be platform dependent but now is merely an alias for float64
(https://nim-lang.org/docs/manual.html#types-preminusdefined-floating-point-types).
Yet, I can't find a language with architecture dependent floats
Good catch! Yeah, I think for float we should have Float
be an alias of Float64
, or even a distinct type, But I'd rather have something short like Float
instead of having to type and read Float64
all the time.
If this is going to be a huge breaking change surely it makes sense to get this out the way as soon as possible, not delay it until the language has even more users.
First, we could move to free up the Int
and Float
names (rename). Next release, they become aliases for Int32
/Int64
and Float64
. We can then push libraries to move to those aliases, so that when they become distinct types nothing breaks.
Could probably introduce the change behind a flag at the same time as the aliases, so that libraries can test for compliance, but the same code still compiles without the flag.
I like that idea!
Just note that:
So I guess the first thing for me will be to try this out and see how it works.
One thing to think about: when you want to map an integer to a database you usually want Int32
or Int64
(or even other integer types). Using Int
then is a bit confusing because the DB column type will need to change if we are in 32 bits or 64 bits. Making it Int32
in the DB but exposing it to the use as Int
works for reading but not for writing (if you try to write something bigger than Int32::MAX
it will fail), and making it Int64
in the DB works for writing but not for reading, in 32 bits.
Another problem: the literal 1
will have the type Int
and that's fine. But what about 2147483648
(Int32::MAX + 1
)? It could be Int
but then it won't compile in 32 bits, effectively making some programs stop compiling depending on the architecture. In fact I just tried this in Go and that's exactly the behavior you get. So maybe it's fine? 😅
Those are great counter-examples of having architecture-specific Int
and UInt
.
Literals
If I use something higher than Int32::MAX then I actually expect an Int64, not an Int, and it just happens to work on 64-bit targets. Having a compile time error for 32-bit targets seems appropriate?
It means Crystal can't infer 2147483648 as an Int64 or 9223372036854775808 as an Int128, and we'll have to manually type them (oh no), but does it happen much? maybe some explicitness ain't that bad?
Database
I believe database columns should be explicit, that is either Int32 or Int64, but if integers are usually an Int
it may create some friction and require some explicit casts (oh no)...
Another point to coincider is to separate the notion of base integers and native integers. Currently, there are some operations and overloads that work only with native, but since BigInt < Int
they match wrongly with BigInt
.
The current alias to a union for primitives works on overloads but not on definitions in the base class.
I think that we could make all the std work with Int
, that is, the architecture-dependent type. Then BigInt
won't match that, nor Int32
nor Int64
. You'll have to explicitly convert the values from those types.
That seems kind of bad but if Int
is the default type everywhere then it's not bad. And we also reduce the number of method instantiations: right now a method accepting Int
could get an instance for Int8
, Int16
, Int32
, etc., but with this change it'll always be Int
.
How will that work when math shard A uses Int (now fixed at Int32), math shard B uses Int64 and serialized formats (for example protobuf) are a mix of Int8|16|32|64, UInt8|16|32|64? Will I need to manually convert between types every time a variable crosses a function boundary? Where does over/underflow checking happen? Do I have need to check manually with each conversion?
How will that work when math shard A uses Int (now fixed at Int32), math shard B uses Int64 and serialized formats (for example protobuf) are a mix of Int8|16|32|64, UInt8|16|32|64?
I think that's also a problem right now with Int32
being the default integer type.
Will I need to manually convert between types every time a variable crosses a function boundary? Where does over/underflow checking happen? Do I have need to check manually with each conversion?
The answer is yes
because that's something you also need to do now with Int32
being the default integer type.
Looks like I can pass any type of Int
with the type preserved. With your proposal would this still work or will it convert to Int32
?
def lib1_add(a : Int, b : Int)
c = a + b
lib2_func c
end
def lib2_func(x)
p typeof(x)
end
x = 1_u64
lib1_add x, x
Output:
UInt64
@didactic-drunk Int
is currently an union alias (not aliased to Int32
only):
alias Int = Int8 | Int16 | Int32 | Int64
We can rename the alias AnyInt
and keep the same behavior.
@didactic-drunk
Int
is currently an union alias (not aliased toInt32
only):alias Int = Int8 | Int16 | Int32 | Int64
We can rename the alias
AnyInt
and keep the same behavior.
Based on my example doesn't that mean math functions (or most functions) should use AnyInt
and we're right back where we started?
A major complaint when working in physics with c++ is Int
sizes. Someone writes an algorithm using Int32
or Float32
for their problem and it's fine. Someone else attempts to use it with physics data and it over/underflows. SInce they're only half programmers they don't use things like version control. Instead they email files back and forth so things like Int128
never make it upstream. Each person who gets the file from the original programmer has to change Int32
to Int128
.
They probably should have used a template but that's beyond them. They tend to use the default.
If Int32/64
is the default it will be wrong some portion of the time. They should use AnyInt
? No. They'll copy and paste from an example they found on google which likely uses Int
. When it's too small they'll change it to Int128
manually. When the first person refines the algorithm? They email it to a few of the people who change the types again.
Why? Int128
doesn't perform as well as Int32/64
. It also requires much more memory/storage space. These run on huge clusters with > petabyte data sets. Each person wants the Int
type for their specific problem space but the algorithms are generic.
AnyInt
solves the problem, which is why I think it should remain the default named as Int
.
@didactic-drunk Names are exchangeable. I won't go into details about pro and contra of which name.
The problem isn't names. It's default behaviour. A union type can't be used as type of an instance variable. But some type must be specified everywhere you need to store integers. Currently, we advocate to use Int32
everywhere by default because that's safe and fits for most use cases. It is also the default type of untyped integer literals.
Even your non-programmer algorithm writers need to pick data types for their integers. And it can't always be a union type, no matter whether it's called Int
or AnyInt
.
+1, (in my opinion as a novice to Crystal) would be a good change.
I just asked this question how to hack crystal to use Int and Float everywhere and got link to that issue.
Clean, readable, compact code is one of the key feature of Ruby. Hard to justify 32
and 64
noise in codebase if they don't contribute or mean anything, at least in my projects as I use only those two everywhere.
+1 for making them the same on all platforms. Less confusion porting (and debugging somebody else's code). If they want to interface with C...maybe create a new type called "NativeInt" or something, that can be used as the parameter?
I attempted to ask here: https://forum.crystal-lang.org/t/int32-and-float64-why-the-defaults/1797 why are Int32 and float64 the defaults? Curious, since one is "32" and the other "64", thanks :)
I think it makes sense to have this before 1.0 It's much safer to use Int64 by default when dealing with native numbers in JSON and database.
@cyangle This is not going to happen before 1.0. No other major changes are expected before 1.0
Really? I think this and #8872 are just as important as overflow checks. It changes everything about numbers in the language...
The thing is that @waj just showed me a couple of benchmarks. For example this:
require "benchmark"
puts 1
a = Array(Int32).new(50_000_000) { rand(Int32) }
puts 2
b = Array(Int64).new(50_000_000) { rand(Int64) }
sa = 0_i32
sb = 0_i64
Benchmark.ips do |ips|
ips.report("Int32") { sa = a.reduce(0_i32) { |s, i| s &+ i } }
ips.report("Int64") { sb = b.reduce(0_i64) { |s, i| s &+ i } }
end
puts sa
puts sb
It's slower for Int64. The reason is that even though math operations take probably the same time, the data that you can put on a cache line or bus is smaller, so there's that performance loss with Int64.
What we are considering, though, is adding a Size
type that's a different type than Int32
and Int64
, and that would be used as the type of size
in collections. That way you can have bigger collections in 64 bits machines. But the default integer type still stays Int32
for performance reasons (same decision as, for example, Rust).
I'm not sold on that. When 32-bit vs 64-bit performance matters (and 32-bits are big enough to hold the data) you can simply optimize your code by using Int32 explicitly. But that's actually an edge case for heavy math operations. For the vast majority of use cases the performance difference is completely negligible. But usability would greatly improve if we just had a simple default integer data type that works for (almost) everything. You would only have to resort to explicit types for binary interfaces, optimizations and maybe some other special cases.
Does the Rust-way fits Crystal? I think Crystal is closer to Go and Swift: abstract details but give access to low-level when needed. In that benchmark, if Int32's are enough, then you can optimize (cool), thought we're talking of 190MB vs 380MB arrays. That's kinda big, and the performance hit ain't so bad (1.28× slower) given that the CPU caches are busted twice as many times.
Having a specific Size
type for collection sizes introduces friction (or weird type changes/overflows) whenever we'll want to compute anything with them (not cool). It also requires to continue to type as Int32
instead of a simpler Int
—using Size
for integers is weird, and not the recommended way to interact with libraries.
Personally I think discussion new integer types right now is entirely missing the point of 1.0.
The original plan was to release 1.0-pre1 as 0.35.0+bugfixes and now we're discussing this? Even #9357 can be implemented after 1.0 by adding a long_size
instead of changing size
, which is originally why I stopped working on it.
I personally wouldn't mind having a default integer type that's Int64
in 64-bit machines. I think the same was you, @ysbaddaden . But not everyone thinks the same so we have to come to some consensus.
We've also been talking about making the @size
of collections (maybe only Slice
for now) be Int32
or Int64
, exposed as Int32
with size
and as Int64
with size64
. That's similar to how it's done in C#, where arrays have a LongLength property. This way, if you really need big collections or slices you can still work with them, but for the general case collections with less than Int32::MAX
elements are probably enough for most use cases.
However, nothing is set in stone yet, this is what we've been discussing so far.
Ask MIT
On Thu, Jul 9, 2 Reiwa at 3:52 PM Ary Borenszweig notifications@github.com wrote:
I personally wouldn't mind having a default integer type that's Int64 in 64-bit machines. I think the same was you, @ysbaddaden https://github.com/ysbaddaden . But not everyone thinks the same so we have to come to some consensus.
We've also been talking about making the @size of collections (maybe only Slice for now) be Int32 or Int64, exposed as Int32 with size and as Int64 with size64. That's similar to how it's done in C#, where arrays have a LongLength https://docs.microsoft.com/en-us/dotnet/api/system.array.longlength?view=netcore-3.1 property. This way, if you really need big collections or slices you can still work with them, but for the general case collections with less than Int32::MAX elements are probably enough for most use cases.
However, nothing is set in stone yet, this is what we've been discussing so far.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/crystal-lang/crystal/issues/8111#issuecomment-656319108, or unsubscribe https://github.com/notifications/unsubscribe-auth/APXLM3LF5H63FA3NDGGFXKLR2YNWJANCNFSM4IO7LCVQ .
We've also been talking about making the
@size
of collections (maybe only Slice for now) be Int32 or Int64, exposed as Int32 with size and as Int64 with size64.
That's even worse :sob:
Given that Int32 and Float64 are the default types it feels a bit redundant to type those 32 and 64 numbers all the time.
I think being specific about the type in a statically typed language is a positive. It shouldn't feel redundant, it should feel good because it's explicit. Not against an Int
or Float
alias that is platform-dependent though.
I agree with this proposal, but it may change too much and be difficult to implement. Regarding simplifying numerical systems, I have referred to multiple programming languages and provided the following highly feasible proposal. This proposal is gradual, some can be completed in 1. x, and some need to be completed in 2.0.
Number
, Float
, Int
.Float32
, Float64
, Int8
, Int32
, UInt8
, UInt32
, and so on.Float64
and Int32
, users cannot assume that they will not change in the future.Change the return value type of size
, sizeof
and instance_sizeof
to Int
, and the actual return value type is the default numerical type.
Change the return value type of to_i
to Int
, and the actual return value type is the default numerical type. When users care about specific numerical types, they should call a method that explicitly specifies the type, such as to_i32
. One advantage of doing this is that to_i
and to_i32
has different application scenarios instead of being mixed according to user preferences, which can better unify the coding style.
For floating-point numbers, the same applies.
What about to_u
? We should deprecate it. When the user calls to_u
, it means caring about specific numerical types and should call a method that explicitly specifies the type, such as to_u32
.
At this step, we only modified the method signature and document description, unified the coding style, without breaking change. Therefore, the old code can compile normally without being affected. So this step can be completed in 1.x.
Whether it is a standard library or a third-party library, follow the coding style determined in the step 1 and gradually unify the code related to numerical operations.
This step will go through a considerable amount of time, not only covering a wide range, but more importantly, it will take a considerable amount of time to cultivate user habits.
This step is only about better unifing the code related to numerical operations, without breaking change, so it can be completed in 1.x.
If possible, these two methods can be added under the Top Level namespace:
max (*numbers: Number) : Number
min (*numbers: Number) : Number
This is more convenient to use than Math.max
and Math.min
.
Add Number#one
method and deprecate Number#additive_identity
and Number#multiplicative_identity
, because Number#zero
and Number#one
are simpler and clearer.
Add Float#%
method, the %
operation should not be limited to integers, and can make //
operation implementation simpler.
There was no breaking change in this step, so it can be completed in 1.x.
Although there are not many cases of overflow, we cannot ignore it. There are two strategies for handling overflow:
Float
: +Inf
, -Inf
and NaN
. For Int
: a wrapping is performed.I suggest referring to rust-lang strategy here.
Deprecate all wrap operations and only retain regular operations. For which overflow handling strategy, specify it in the compilation options:
We can even deprecate methods to_i!
, to_i32!
, to_f!
, to_f64!
, etc.
For very few cases where users need to specify overflow handling strategies in their code, we can refer to approach in C# lang. In short, we should make the code in regular cases as simple as possible.
There was no breaking change in this step, so it can be completed in 1.x.
There is an undeniable fact here: Decimal
, Rational
, and Complex
are "second class" relative to primitive numerical types, and they are only applicable to certain specific fields.
I personally strongly recommend lifting their inheritance relationships with Number
, Float
, and Int
. From a practical perspective, I cannot see the benefits of inheriting these non primitive types from Number
, Float
and Int
. Because special handling often occurs where these types are involved, this inheritance relationship does not effectively reuse code and can also make originally simple code more complex. Whenever we see Number
, Int
, or Float
types, we always need to consider: can this code handle non primitive types correctly? Especially for user-defined types.
Limiting Number
, Float
, Int
to abstract types of primitive numerical types and prohibiting user-defined types from inheriting them can make many codes much simpler.
In this way, Number::Primitive
, Float::Primitive
, Int::Primitive
can also be deprecated.
In addition, the Decimal
, Rational
, and Complex
in the current standard library are not complete. These numerical types are essentially composed of primitive numerical types. We should implement them through generics and give an alias to commonly used types, which looks like this:
Decimal
Decimal::Generic(V, S) < Decimal
alias Decimal64 = Decimal::Generic(Int64, UInt8)
alias Decimal32 = Decimal::Generic(Int32, UInt8)
alias BigDecimal = Decimal::Generic(BigInt, UInt64)
Rational
Rational::Generic(T) < Rational
alias Rational64 = Rational::Generic(Int64)
alias Rational32 = Rational::Generic(Int32)
alias BigRational = Rational::Generic(BigInt)
Complex
Complex::Generic(T) < Complex
alias Complex64 = Complex::Generic(Float64)
alias Complex32 = Complex::Generic(Float32)
alias BigComplex = Complex::Generic(BigFloat)
NOTE: BigRational
has an implementation that is not generic, and the performance of generic may not be as high as before, but the code consistency is better.
There are some breaking changes in this step, but these are all related to specific application areas and the impact will not be significant.
Change the default integer type to platform specific. On the 64-bit platform is Int64
, and on the 32-bit
platform is Int32
.
The internal implementation of types such as Pointer
, Slice
, Array
, String
, etc. has also changed from Int32
to the default integer type.
The changes in this step are a bit significant, but considering that it has been quite some time since the Step 1 and the user's coding habits have formed, it is safe to make breaking changes at this time.
What type of result is obtained when performing operations on different integer types? There are several different strategies:
self
, such as the current version of Crystal.Int8 + Int32 -> Int32
, such as nim-lang.I think the third strategy is more suitable for 'static typed languages that write like dynamically typed languages'.
When users mix different types of integers, it indicates that they do not care about specific numerical types, and we should automatically elevate them to appropriate numerical types.
Unsigned and signed operations promoted to signed: UInt32 + Int32 -> Int32
For different types of floating-point operations, the same applies.
Int
and Float
operations promoted to Float
: Int64 + Float32 -> Float32
The signature for division is as follows: Number#/(other: Number) : Number
, implemented by converting to the default float number and calculating it. Although the result is Float64
, the method signature is Number
, indicating that users should not assume that the return value type will not change in the future.
Assignment operations for class variables and instance variables, automatic type conversion.
The assignment operation of a local variable remains the same as the current implementation, that is, union type.
For overflow during automatic type conversion, refer to the overflow handling strategy section.
Bit arithmetic is not considered here, and the rules of bit arithmetic need to be considered separately.
The changes in this step are significant and have a wide range of impacts, and can only be implemented in 2.0.
This step is done in 2.0, and we can freely clean up deprecated code in 1.x.
At this point, all the simplification of the numerical system has been completed.
Thanks @Erdian718 for this detailed proposal. I think it might be a bit more extensive than the scope of this individual issue though.
Step 3 and 4 may be good ideas but they don't seem directly related to the simplification of integer and float types (either set of changes could be implemented independently).
Step 5 makes no sense. There are no Rational
and Decimal
types. Complex
exists, but it does not even inherit Number
.
Step 7 also seems a bit of a separate issue (ref #8872).
Step 5 makes no sense. There are no
Rational
andDecimal
types.Complex
exists, but it does not even inheritNumber
.
Yes, there are currently no Rational
and Decimal
, but there may be in the future.
What I mainly want to express is that Number
, Float
and Int
are only represented as primitive numeric types, preventing other non primitive numeric types from inheriting them, such as BigRational
, BigDecimal
, BigFloat
and BigInt
. That is:
Number == Number::Primitive
Float == Float::Primitive
Int == Int::Primitive
And deprecate Number::Primitive, Float::Primitive, Int::Primitive
This will simplify the code in many places, we don't need to always consider: can this code handle non primitive types correctly? Especially for user-defined types.
Minor note: something like #14393 adds the possibility of a platform-specific Int
type smaller than Int32
To be honest the C types for AVR are challenging: the CPU registers are 8-bits, so native integers are also 8-bits, but pointers are 16-bits (for up to 64KB of memory) while some boards have 128KB of flash memory (?!); size_t
and int
are also 16-bits, but long
is 32-bits and long long
is 64-bits. Oh, and both float
and double
are 32-bits but starting from GCC 10 double
may be 64-bits :shrug:
Anyway, AVR having a very limited program space, only the bare types are interesting, and so far I'm still pondering whether I'd like the default integer to be 16 or 32-bits.
ISO C mandates int
and size_t
to have at least 16 bits. The same goes for cc65 (targets 6502) too
Please solve this problem in some way because it's scary to use. I'm not kidding it's really scary how many errors there can be. My friend and I are very interested in your language, but these problems with numeric types quickly put us off in the beginning. We'll just keep watching for now. Good luck to you. You have made a very interesting programming language.
@nerzh Could you elaborate what you find "scary" about using number types? This hasn't been brough up yet in the discussion, so it's really not clear what you're referring to. Please be aware that there are no fundamental issues with number types. This discussion is merely talking about improving the developer exerience, not fixing a bug or anything like that.
This hasn't been brough up yet in the discussion, so it's really not clear what you're referring to.
This came up on the Discord and the gist of it, as I followed it, was:
a = 1
b = 9223372036854775805
pp a + b # => Unhandled exception: Arithmetic overflow (OverflowError)
Whereas in other languages they're used to, this would be a compile time error vs runtime. So more so https://github.com/crystal-lang/crystal/issues/8872 than this issue itself I'd say.
@straight-shoota sorry, I got my issues mixed up, this is in response to https://github.com/crystal-lang/crystal/issues/8872
Right now Crystal has a variety of integer and float types:
Int8
,Int16
,Int32
,Int64
,UInt8
,UInt16
,UInt32
,UInt64
Float32
,Float64
The default integer type when you don't use a suffix is
Int32
and the default float type isFloat64
.This kind of works but I imagine something better.
Int32 and Float64
Given that Int32 and Float64 are the default types it feels a bit redundant to type those 32 and 64 numbers all the time.
So here's an initial idea: what if we name those types
Int
andFloat
? We would of course need to rename the existing base typesInt
andFloat
but that's not a problem, we can maybe call themIntBase
andFloatBase
orIntegral
andFloating
, it doesn't matter much because those names won't be used a lot.Then talking about ints and floats is so much simpler: just use
Int
andFloat
everywhere. In the case where you do need a specific limit, which is rare and usually only useful in low-level code such as interfacing with C or writing binary protocols, you can still use the namesInt32
,Int64
,Float32
or whatever you need.What to alias to
Now, we could make
Int
be an alias ofInt32
andFloat
an alias ofFloat64
, but maybe it's better if we makeInt
depend on the architecture. That meansInt
would be equivalent toInt64
in 64 bits architectures.This is also how Go works. They recommend using
int
everywhere unless you have good reasons to use a specific size. It's probably the case that usingInt64
by default instead ofInt32
works equally fine (maybe even better because the range is bigger so overflow is less possible) without a real performance degradation.Another nice thing is that if eventually 128 bit architectures appear all programs will automatically start using this bigger range (if we want to) without needing to change any code.
To alias or not
Now, we could make
Int
be an alias of the respective underlying type, but I don't think that's a good idea. The reason is that if you have a program that does:that would compile in 32 bits but would stop compiling in 64 bits. Ideally we'd like our programs to always compile regardless of the architecture.
So, we could make
Int
andFloat
be different types. To assignInt32
orInt64
to them you would need to callto_i
first. Then programs on 32 and 64 bits will go through that explicit conversion process.Another benefit is that we could start making collections use
Int
as the size. This increases their amount a bit but I think it's fine: it's probably not a huge performance/memory penalty (most of the memory is in the actual data). But then their limit becomes the limit of the architecture's memory (well, half of it if we use signed integers, but it's still a lot more than we can do right now). And, like before, this limit will automatically increase when the architectures improve (well, if the Amazon burning doesn't mean our imminent doom 😞).Friction?
If we need these conversions between
Int
and all other integer types, and same forFloat
, wouldn't it make it really hard to write programs, having to convert between integer types all the time?No, I don't think so. Because
Int
will be the default type everywhere, except for the few cases I mentioned before (C bindings and binary protocols) there would be no reason to use another integer type.More benefits
Right now when we parse JSON and YAML we use
Int64
because it would be a shame to parse toInt32
because we might lose some precision.With this change the type would be
Int
, as everywhere else, and this can be assigned to everything else too if we stick toInt
as a default. I know in 32 bits the limit will be smaller, but 32 bits machines are starting to become obsolete (for example I think Mac is dropping support for 32 bit apps).Breaking change?
This is probably a breaking change, but a good one.
Summary
In summary if we do this change we get:
Int
andFloat
Int