crystal-lang / crystal

The Crystal Programming Language
https://crystal-lang.org
Apache License 2.0
19.47k stars 1.62k forks source link

[RFC] Let the default integer type depend on the platform #6626

Closed asterite closed 6 years ago

asterite commented 6 years ago

Right now Int32 is the default type for integer literals that don't have a type:

x = 1
typeof(x) # => Int32

We also have #4011, where Array is limited to an Int32 length... and in fact we also have this limit in String, Slice, Hash and basically every container. I think this can indeed be a problem if someone wants to have a program with that memory. Maybe right now it's not that common, but as time passes maybe we'll have 128 bits machines and having more and more data will be common?

In any case, we should have an integer type that depends on the platform and have containers that support such size.

To do that, I propose all of the following:

We can then also do the same with UInt (and have literals like 123u).

This change doesn't have to be done now, but I believe eventually we'll have to do it.

RX14 commented 6 years ago

This will cause portability problems, and I really don't think thats a good idea.

I don't think this is actually going to happen either. The size of pointers is unlikely to change to be 128bit in the near future simply because having 16 exibytes memory is actually almost physically impossible. By the time physics gets to the level where we have to worry about 128bit address busses we'll all be dead.

Since the integer size of processors is closely related to the pointer size, I don't forsee support for more than native 256bit arithmetic in the future. I think a good bet would be 64bits as a good default integer size until basically forever

j8r commented 6 years ago

After some research, the performance impact may be negligible, it can even improve a bit. This needs confirmation for the Crystal case.

It's too bad now on 64bits machines to not use by default all available bits in the registers.

asterite commented 6 years ago

@RX14 Well, yeah, it's what I propose or at least changing the default int size to Int64... not sure how will that behave in 32 bits systems (yes, Crystal still supports those).

I forgot to mention that what I propose is how it works in Go and Nim. I didn't check other languages, thought...

j8r commented 6 years ago

For reference, Rust has is isize and usize and Golang int and uint.

j8r commented 6 years ago

Using 64bits integers on 32bits systems will undoubtly kill the performance, really bad idea.

armhf is still there, and in general 32bits is and will be still widely used in efficient/embedded systems. One Crystal aim is to be efficient, some users target this machines because it fits perfectly on Raspberries, Arduino, routers...

@asterite 's proposal will bring better use of the underlying system's registers, bringing eventually more performance.

asterite commented 6 years ago

Hm, Rust's default integer type is Int32, and they say it's the fastest type, even in 64 bits systems. So that's something to think about...

pkorotkov commented 6 years ago

@asterite The final remark you focused on refers to situations when a programmer isn't sure which type to pick. Though the general rule is

... the isize and usize types depend on the kind of computer your program is running on: 64 bits if you’re on a 64-bit architecture and 32 bits if you’re on a 32-bit architecture.

asterite commented 6 years ago

@pkorotkov Right, but if we let 123 be of type Int64 in 64 bits, it might be a waste of size/performance.

j8r commented 6 years ago

It may be only on x86-64, that is x86 backward compatible and has optimizations to run it.

Even if at the end we stick with Int32 by default, I hope we can end up having Int/UInt or something similar to use the integer bits size that has the system (32 or 64).

Other languages brings this types, that's useful for lower level/platform optimization stuff.

asterite commented 6 years ago

@j8r We do have such type, it's LibC::SizeT. Having a type for it is not a problem. The problem is that the current containers (and even files) have a limit of int32, where the limit should be Int64 (or UInt64) in 64 bits systems. And how to design the language in a way that you don't have to constantly cast stuff, or that a program compiles fine both in 32 and 64 bits, even though a type changes definition from one system to another.

pkorotkov commented 6 years ago

@asterite I appreciate your concerns about possible performance hits, but real world apps indeed require machine-dependant lengths of containers.

ysbaddaden commented 6 years ago

Having a simple Int type (and UInt) instead of specifying Int32 would be very nice! Same for the Integer alias, it's a very good idea.

Having an arch dependent integer size would help fix some inconsistencies (e.g. pointer address), and probably help with some C bindings, and yes, allow container types (String, Array, ...) to be capable to map more memory.

I'm not sure about making Int be arch-dependent. You'd have to remember about it and that could lead to portability issues between 32 and 64-bit arch. But Go does that (edit: Swift too), so maybe it's not much of a concern? We can still use explicit Int32 or Int64 when needed.

Defaulting to 64-bit on all architectures would ruin program performance on 32-bit architectures, so I'm uncomfortable with that. It also feels like wasted space to me; I seldom need integers to be in the 64-bit range, but that's my personal impression. Oh well, I can just keep using Int32.

ysbaddaden commented 6 years ago

The proposed RFC is good :+1:

jzakiya commented 6 years ago

+1

RX14 commented 6 years ago

Maybe the standard library can use arch-dependant integer sizes correctly. I don't trust anyone else to run CI on 32-bit systems, simply because it's very hard to set up. For that reason, we shouldn't encourage or expose arch-specific integer types (outside LibC) because without error-prone automatic casting like C, we will end up with all libraries defacto only working on 64-bits. And that's not a situation I want to be in. Extremely strong thumbs down on this whole feature.

Pointer itself can have an arch-specific size, but manually extracting an integer type from it with Pointer#address should always be 64 bits. Pointer arithmetic is usually done on pointer instances anyway, so it won't affect performance.

Most containers should either remain 32-bit or be switched to 64-bits. I feel very strongly that crystal should attempt to unify integer sizes and not expose a confusing array of aliases. Crystal is not a language for building system software, it is a high-level garbage collected language built for non-embedded computers. Even most phones are aarch64 these days. Removing support for 32bits entirely is entirely feasible in the next 5 years, some linux distros have done it already, and 128bit architectures will never happen. Let's be practical, and make things easier and less error-prone for the developer, instead of making Crystal become only "portable with effort", like C is. Most portable C programs have insane macros, and look ugly. Let's not take Crystal down the same path. The current status quo of number types in crystal is fine.

RX14 commented 6 years ago

Swift uses arch-dependant default integer sizes because swift is specifically designed for objective-c compatibility, see also refcounting etc. I doubt swift would be designed as-is if not for those constraints.

RX14 commented 6 years ago

Also note that in go, int and int32/int64 are seperate types (not just an alias which is being proposed here) which always require an explicit cast to go between. This avoids the compilation/portability problem, but doesn't avoid the issue of overflow. We could implement a separate type Int and make it the default, but the diffs would be ginormous and it'd never get merged.

asterite commented 6 years ago

@RX14 that's a good point

straight-shoota commented 6 years ago

I think the only container that effectively needs 64 bit size is Slice, but that is really needed. Being able to load and access more than 4 gigabytes of data in memory is not a thing for the future but actually common in some applications.

Crystal should be able to use 64 bit sizes for Slice, one way or the other. It probably doesn't need a platform-dependent integer type. Maybe it's fine to just use Int64 even on 32 bit platforms. Can anyone tell how much of a performance impact that would have?

RX14 commented 6 years ago

I strongly believe that by the time anyone is using crystal on embedded linux, they will be using it on aarch64 or a similar 64bit architecture - simply due to economies of scale with the smartphone soc market. I just don't think 32bit is going to exist on any system crystal is targeting going forwards, so I'm really in favour of using 64bit for pointers and slices. Arrays, strings and other containers can be discussed later after we've agreed a strategy for slice.

straight-shoota commented 6 years ago

For the record: Pointer already uses UInt64 for addresses even on 32 bit platforms.

ysbaddaden commented 6 years ago

Assuming that 32-bit is dead is a bold assumption. I doubt Raspberry will drop its 32-bit models anytime soon, and they don't seem eager to have an arm64 Raspbian for the Pi 3.

Assuming that max integer size is always the pointer size is wrong. On Arduino pointers are 8-bit (or maybe it's 16?) but an int is 32-bit. On ARM the FPU has 64-bit registers. Who can say in some years there won't be an arch with 64-bit pointers and 128-bit integers? Most languages already provide support for them.

I don't know if an Int type should be arch-dependent. What I know is that lower-level languages, such as Rust chose to keep it a 32-bit integer, and to introduce an arch-dependent type (isize and long for C), whereas similar languages such as Go and Swift decided it to be arch-dependent.

Swift didn't have to:

Swift could have fixed Int to be Int64 and keep a Long type around for Obj-C compatibility, yet, they didn't and instead chose for Int to be arch-dependent. That's an interesting choice, and IMHO a realistic one.

RX14 commented 6 years ago

@ysbaddaden even if 128bit arithmetic pops up as common (we already have an Int128 type), I doubt we'd want to make it the default type.

I'm also not thinking about dropping 32bit support, but I'm being realistic that 99%+ of the usage of crystal will be on desktops and servers. So taking a performance hit on 32bit is acceptable, especially if it's only around slices.

However, for the record, I think the status quo on int types is just fine too. The only thing that really needs to be changed is the types inside certain specific containers to 64bit. Any proposal to make architecture-dependant types common or recommended outside of C-interface is a huge no from me though.

j8r commented 6 years ago

32-bit users, of embedded or old devices, don't have to be relagated to second class citizens.

For now, for things limited by of the system's register/pointer size, like containers, a new system dependent type like IntT and UIntT can be used (or whatever name), based on LibC::SizeT. This will change nothing for 32-bit, and more space for 64-bit.

Like @ysbaddaden said, all isn't black and white. With RISC-V coming, and also OpenRISC, this opens even more possibilities.

The language adaptating itself a bit depending of the platform will allow to take more advantage of the system's characteristics.

straight-shoota commented 6 years ago

@j8r This is no an issue as long as the type is only used internally. But with Slice#size returning Int32 or Int64 depending on platform, you can end up having different types in your math operations and potentially risk overflows on 32 bit systems.

j8r commented 6 years ago

@straight-shoota good point. This could return an IntT, and then casting can be made. Or simply just let users cast with to_i if they want reducing the risk of overflow.

But at first, if we agree to start using them progressively at some places internally, it would be a nice start 😀

An example may be #6640

The hypothetic IntT could be a dedicated type, not just an alias, something like 9_t and 9_ut with #to_t for casting.

asterite commented 6 years ago

Closing because I think this is a bad proposal. The way to go would be to have a separate int type that's not compatible/assignable from other int types, like in Go. That would make compilation on one system always compile on another system. But I don't think we can do that, or we'll do that.