Convert Futhark to 64-bit

athas commented 8 years ago

The Futhark compiler is presently tied heavily to the 32-bit world. Not only are all dimension sizes and loop indices 32-bit (signed!) integers, but the generated code also has these assumptions. Clearly this is not going to fly in the long run. I don't want to mix 32-bit and 64-bit sizes, so we should just move everything to 64-bit, always. Should the int type also default to 64-bit? I think that might be a little confusing.

I did a few experiments, and 64-bit integer arithmetic does not seem to be noticeably slower on the GPU, and we rarely have arrays of sizes, so programs should not end up using more storage.

athas commented 8 years ago

Cosmin made a good point that we should also change iota to return 64-bit integers, because it is often used for generating indices. I must admit I am a little uneasy about making iota return a different type than the default int (which will likely remain an alias for i32).

athas commented 8 years ago

This has some uncomfortable consequences. This program becomes invalid:


fun main(i: int, bs: [n]bool): bool =
  i >= n

Because n is now of type i64, while n is of type i32.

oleks commented 8 years ago

Also beware of this stuff.

athas commented 8 years ago

I am not sure you have read that paper.

\ Troels /\ Henriksen

athas commented 8 years ago

Specifically, Futhark is a "safe" language, in that everything is bounds-checked. The issues raised in that paper are about implicit conversions and similar errors in a low-level language. I don't think it is relevant to us.

oleks commented 8 years ago

Or thought about this issue.

athas commented 7 years ago

This is becoming increasingly relevant and will have to be solved relatively soon. The compiler engineering part is straightforward enough; the big question is how the source language is affected. We will probably need @melsman's advice on language design here.

In Rust, all sizes are of a supposedly opaque size type. Under the covers, it is almost always a 64-bit integer, however. I think the intent is to make the programmer stop and think when he or she does size computation. Maybe that would be a good way to go.

athas commented 7 years ago

I talked to @melsman about this and we decided to just make sizes to be of type i64. There is no reason for a size type.

However, there is one more problem that I forgot to bring up. Right now, the arguments to iota and replicate can be of any integral type. This means that the law shape (iota x) == [x] does not hold, because shape always returns an array of i32s (and i64s in the future), while x can be some other type. Do we care about this?

athas commented 7 years ago

Changing the type of dimension declarations from i32 to i64 makes 155 of our 641 test programs fail. Wonderful way to spend a weekend.

athas commented 7 years ago

Oh, and we'll need to make some extensions to ScalExp handling and @coancea's algebraic simplification to handle 64-bit values. Possibly also propagate range information through type conversions. I begin to remember why I gave up last time I attempted this change.

athas commented 7 years ago

I have modified enough of the compiler to translate OptionPricing. Unfortunately, we get about a 50% slowdown, likely due to the fact that 64-bit operations are emulated on current GPUs (and take up more register space, too). The only thing that are 64-bit are array shapes and index calculations, and I suspect the latter is what kills us. I'll suspend my efforts for now. It takes less than a day to convert the compiler to 64-bit, but the trick will be coming up with a technique that also makes it generate fast code.

One solution would be to put the burden on the programmer to indicate the type of dimension sizes of arrays. This feels very complicated and clunky, however. Another would be to use an opaque size type in the source language, which we can then translate as appropriate for the target hardware.

One thing that is certain is that we will definitely have to come up with a fix if we want to scale to large multi-GPU/distributed programs. While we do support arrays taking up more than 4GiB space, we cannot handle arrays with more than 2**31-1 elements. I have a nagging suspicion that such arrays will occur eventually.

RasmusWL commented 7 years ago

I'm all in favour of introducing a size type. It feels like TheRightThingToDo™.

athas commented 6 years ago

This has turned up in the way we handle segmented/blocked operations like scan and reduce_by_index, where we first compute a flat index. This computation may overflow, even though the nested index will not.

athas commented 5 years ago

I have now encountered real programs that need to handle arrays with more than two billion elements. We need to address this.

athas commented 4 years ago

Everyone I've talked to seems to think we should just do one massive compatibility break, and turn all size parameters (and functions like iota) into 64-bit versions, all at once. This will break pretty much every Futhark program, but is there a sensible alternative? At least the breakage will be pretty easy to fix.

Munksgaard commented 4 years ago

I am certainly for it, as I've also said in-person. The only viable alternative that I see, is to use an opaque size type, like usize and isize from Rust, but that will probably cause a similar amount of breakage to this solution and (if I understood @athas correctly) will be significantly more work.

It's my understanding that this is something that 32-bit sizes is something that we need to address no matter what. The users we currently have are clearly already happy to be "living on the edge", so I suspect most of them will be alright with changing their code. Migrating to 64-bit sizes shouldn't be too hard, and we're not really forcing anyone to do anything: They can just stay at an old version of Futhark.

If we know of some of the people using Futhark, perhaps we can reach out to them and hear what they think?

athas commented 4 years ago

Adding a size type to the source language is not very difficult, but we'd need a bunch of new functions to convert between it and the various concrete types. It would be difficult to add such a type to the core language, though. Ultimately, the only advantage of such abstraction is when you care about small 32-bit machines, which I don't think is in Futhark's problem domain. We care more about large, fast machines, or at least scaling to them.

Munksgaard commented 4 years ago

I guess then a relevant question might be: Will we ever care about machines with larger address spaces than 64 bit?

athas commented 4 years ago

Yes, but you don't need 128-bit sizes to deal with those. Even a 128-bit machine will not have single arrays whose size cannot fit in a 64-bit integer (in my lifetime).

Munksgaard commented 4 years ago

Even a 128-bit machine will not have single arrays whose size cannot fit in a 64-bit integer (in my lifetime).

Careful! Hopefully you've got many years left in you.

athas commented 4 years ago

I can now run a 64-bit version of OptionPricing, and it's about 20% slower on my Vega 64 GPU.

Porting Futhark programs to use 64-bit sizes is not particularly difficult, but it is quite tedious.

athas commented 4 years ago

Slowdown on LocVolCalib is about the same.

athas commented 4 years ago

On the RTX 2080 Ti, the difference is smaller, and in many cases negligible.

athas commented 4 years ago

Impact on CPU performance appears fine. Actually, from what I can see it gets a little faster (5-10%). I hope that will also apply to the multicore backend.

athas commented 4 years ago

Impact on LUD and SGEMM is neglible-to-zero, especially when using the CUDA backend. It really does seem CUDA behaves much better with 64-bit sizes. I am pleased that these relatively highly tuned benchmarks behave so well. So far, my analysis is that 64-bit sizes are primarily detrimental for map-reduce kernels with very large map kernels (like OptionPricing).

workingjubilee commented 4 years ago

Rust uses the usize/isize type partly because, as a strongly hardware-oriented abstract machine (in this sense "Rust" is "a thin layer on the metal"), it would like to have low-impedance abstractions over various ISAs, especially when handling raw pointers, and there are actually already 128-bit integer ISAs! They use 64 bit hardware in practice, of course, they just allow pointers to be as wide as 128 bits. So scaling from 16 bits to 64 or even 128 bits matters there.

I think that Futhark is fine to select 64-bit "pointer" type, however, if it thinks it will only see usage in that area.

athas commented 4 years ago

Rust also needs to actually deal directly with object addresses ("pointers"). In Futhark, that is not exposed to programmers - only the sizes of objects, and offsets within them. Even with our current 32-bit indexes, a Futhark program can still allocate far more than 4GiB of memory, since the pointers that operate as array offsets are always whatever the target machine uses.

I have measured the impact of 64-bit sizes on our microbenchmarks, and as expected, there is very little difference. The one weird exception is regular segmented scans, which shows a 30% slowdown. I have not looked at the code yet, but I suspect it's something else that breaks, or an optimisation that no longer applies.

athas commented 4 years ago

The segmented scans are expensive because of the frequent check for whether we are crossing a segment, which involves computing a 64-bit remainder. I think we can improve on that, but I will leave it for later.

workingjubilee commented 4 years ago

Yeah. With Rust, you're even doing a lot of byte-bashing so offsets are very often but I imagine Futhark code is reaaally not doing as much raw byte-bashing.

C++ has even considered it a mistake to use unsigned integers in general and the authors of the STL (including Bjarne Stroustrup) have expressed that they would have preferred it if their indexing, even, used signed integers, because they are strongly of the opinion that unsigned integers should only be used for expressing raw bitfields and an offset is not a bitfield. Which makes sense for their case because you want to often do ~~pointer~~ offset arithmetic.

diku-dk / futhark

Convert Futhark to 64-bit #134