Closed athas closed 4 years ago
Cosmin made a good point that we should also change iota
to return 64-bit integers, because it is often used for generating indices. I must admit I am a little uneasy about making iota
return a different type than the default int
(which will likely remain an alias for i32
).
This has some uncomfortable consequences. This program becomes invalid:
fun main(i: int, bs: [n]bool): bool =
i >= n
Because n
is now of type i64
, while n
is of type i32
.
Also beware of this stuff.
I am not sure you have read that paper.
\ Troels /\ Henriksen
Specifically, Futhark is a "safe" language, in that everything is bounds-checked. The issues raised in that paper are about implicit conversions and similar errors in a low-level language. I don't think it is relevant to us.
Or thought about this issue.
This is becoming increasingly relevant and will have to be solved relatively soon. The compiler engineering part is straightforward enough; the big question is how the source language is affected. We will probably need @melsman's advice on language design here.
In Rust, all sizes are of a supposedly opaque size
type. Under the covers, it is almost always a 64-bit integer, however. I think the intent is to make the programmer stop and think when he or she does size computation. Maybe that would be a good way to go.
I talked to @melsman about this and we decided to just make sizes to be of type i64
. There is no reason for a size
type.
However, there is one more problem that I forgot to bring up. Right now, the arguments to iota
and replicate
can be of any integral type. This means that the law shape (iota x) == [x]
does not hold, because shape
always returns an array of i32
s (and i64
s in the future), while x
can be some other type. Do we care about this?
Changing the type of dimension declarations from i32
to i64
makes 155 of our 641 test programs fail. Wonderful way to spend a weekend.
Oh, and we'll need to make some extensions to ScalExp handling and @coancea's algebraic simplification to handle 64-bit values. Possibly also propagate range information through type conversions. I begin to remember why I gave up last time I attempted this change.
I have modified enough of the compiler to translate OptionPricing. Unfortunately, we get about a 50% slowdown, likely due to the fact that 64-bit operations are emulated on current GPUs (and take up more register space, too). The only thing that are 64-bit are array shapes and index calculations, and I suspect the latter is what kills us. I'll suspend my efforts for now. It takes less than a day to convert the compiler to 64-bit, but the trick will be coming up with a technique that also makes it generate fast code.
One solution would be to put the burden on the programmer to indicate the type of dimension sizes of arrays. This feels very complicated and clunky, however. Another would be to use an opaque size
type in the source language, which we can then translate as appropriate for the target hardware.
One thing that is certain is that we will definitely have to come up with a fix if we want to scale to large multi-GPU/distributed programs. While we do support arrays taking up more than 4GiB space, we cannot handle arrays with more than 2**31-1 elements. I have a nagging suspicion that such arrays will occur eventually.
I'm all in favour of introducing a size
type. It feels like TheRightThingToDoâ„¢.
This has turned up in the way we handle segmented/blocked operations like scan
and reduce_by_index
, where we first compute a flat index. This computation may overflow, even though the nested index will not.
I have now encountered real programs that need to handle arrays with more than two billion elements. We need to address this.
Everyone I've talked to seems to think we should just do one massive compatibility break, and turn all size parameters (and functions like iota
) into 64-bit versions, all at once. This will break pretty much every Futhark program, but is there a sensible alternative? At least the breakage will be pretty easy to fix.
I am certainly for it, as I've also said in-person. The only viable alternative that I see, is to use an opaque size type, like usize
and isize
from Rust, but that will probably cause a similar amount of breakage to this solution and (if I understood @athas correctly) will be significantly more work.
It's my understanding that this is something that 32-bit sizes is something that we need to address no matter what. The users we currently have are clearly already happy to be "living on the edge", so I suspect most of them will be alright with changing their code. Migrating to 64-bit sizes shouldn't be too hard, and we're not really forcing anyone to do anything: They can just stay at an old version of Futhark.
If we know of some of the people using Futhark, perhaps we can reach out to them and hear what they think?
Adding a size
type to the source language is not very difficult, but we'd need a bunch of new functions to convert between it and the various concrete types. It would be difficult to add such a type to the core language, though. Ultimately, the only advantage of such abstraction is when you care about small 32-bit machines, which I don't think is in Futhark's problem domain. We care more about large, fast machines, or at least scaling to them.
I guess then a relevant question might be: Will we ever care about machines with larger address spaces than 64 bit?
Yes, but you don't need 128-bit sizes to deal with those. Even a 128-bit machine will not have single arrays whose size cannot fit in a 64-bit integer (in my lifetime).
Even a 128-bit machine will not have single arrays whose size cannot fit in a 64-bit integer (in my lifetime).
Careful! Hopefully you've got many years left in you.
I can now run a 64-bit version of OptionPricing, and it's about 20% slower on my Vega 64 GPU.
Porting Futhark programs to use 64-bit sizes is not particularly difficult, but it is quite tedious.
Slowdown on LocVolCalib is about the same.
On the RTX 2080 Ti, the difference is smaller, and in many cases negligible.
Impact on CPU performance appears fine. Actually, from what I can see it gets a little faster (5-10%). I hope that will also apply to the multicore backend.
Impact on LUD and SGEMM is neglible-to-zero, especially when using the CUDA backend. It really does seem CUDA behaves much better with 64-bit sizes. I am pleased that these relatively highly tuned benchmarks behave so well. So far, my analysis is that 64-bit sizes are primarily detrimental for map-reduce kernels with very large map kernels (like OptionPricing).
Rust uses the usize/isize type partly because, as a strongly hardware-oriented abstract machine (in this sense "Rust" is "a thin layer on the metal"), it would like to have low-impedance abstractions over various ISAs, especially when handling raw pointers, and there are actually already 128-bit integer ISAs! They use 64 bit hardware in practice, of course, they just allow pointers to be as wide as 128 bits. So scaling from 16 bits to 64 or even 128 bits matters there.
I think that Futhark is fine to select 64-bit "pointer" type, however, if it thinks it will only see usage in that area.
Rust also needs to actually deal directly with object addresses ("pointers"). In Futhark, that is not exposed to programmers - only the sizes of objects, and offsets within them. Even with our current 32-bit indexes, a Futhark program can still allocate far more than 4GiB of memory, since the pointers that operate as array offsets are always whatever the target machine uses.
I have measured the impact of 64-bit sizes on our microbenchmarks, and as expected, there is very little difference. The one weird exception is regular segmented scans, which shows a 30% slowdown. I have not looked at the code yet, but I suspect it's something else that breaks, or an optimisation that no longer applies.
The segmented scans are expensive because of the frequent check for whether we are crossing a segment, which involves computing a 64-bit remainder. I think we can improve on that, but I will leave it for later.
Yeah. With Rust, you're even doing a lot of byte-bashing so offsets are very often but I imagine Futhark code is reaaally not doing as much raw byte-bashing.
C++ has even considered it a mistake to use unsigned integers in general and the authors of the STL (including Bjarne Stroustrup) have expressed that they would have preferred it if their indexing, even, used signed integers, because they are strongly of the opinion that unsigned integers should only be used for expressing raw bitfields and an offset is not a bitfield. Which makes sense for their case because you want to often do pointer offset arithmetic.
The Futhark compiler is presently tied heavily to the 32-bit world. Not only are all dimension sizes and loop indices 32-bit (signed!) integers, but the generated code also has these assumptions. Clearly this is not going to fly in the long run. I don't want to mix 32-bit and 64-bit sizes, so we should just move everything to 64-bit, always. Should the
int
type also default to 64-bit? I think that might be a little confusing.I did a few experiments, and 64-bit integer arithmetic does not seem to be noticeably slower on the GPU, and we rarely have arrays of sizes, so programs should not end up using more storage.