Closed stuebinm closed 3 years ago
This is (yet another) bug in one of the most tricky optimisation passes. Minimised example:
let DistMatrix [n] [m] (a : [n]f32) (b : [m]f32) : [n][m]f32 =
let initial = replicate n (replicate m 0)
let outside = (replicate (m+1) f32.inf) with [0] = 0
in (loop (D, column) = (initial, outside) for i < n do
let next_row = loop cs = replicate (m+1) f32.inf for j in 1...m do
cs with [j] = f32.minimum [cs[j-1], column[j]]
in (D with [i] = (next_row[1:] :> [m]f32), next_row))
|> \(x,_) -> x
let main [d] [n] (s : [d][n]f32) tri =
map (\(a,b) -> DistMatrix s[b] s[a]) tri
Probably related to an attempt to tile inside doubly-nested loops. No idea whether it makes sense to even try.
Although note that calling f32.minimum
on a two-element array is very wasteful. Just use f32.min cs[j-1] column[j]
.
a thanks for the tip, I hadn't noticed that function yet (though in my original code it was a slightly longer array, just shortened it for the issue report here)
Generally, the Futhark compiler will assume that most such functions are given "large" arrays and move heaven and earth to parallelise them. This can result in (far) more overhead than is really justified, like here (although it still shouldn't crash). It doesn't understand that some are small (even if it's a constant). That's certainly a compiler weakness, but as a consequence, good practice is to use functions such as foldl
or manual loop
s when you know the arrays are going to be small (or just unroll manually for particularly small arrays, like this one).
I'm new to futhark, and stumbled across the following issue:
(I've minimised the program as much as I could; whatever the error is, it appears to be fickle, and is prone to disappearing when i replace e.g. let-bindings with default values).
The code yields an internal compiler error when trying to compile with cuda or opencl (or pyopencl), but not when compiling to c (using the latest version from git, compiled with nix via the provided default.nix on nixos-stable):
Interestingly, the entire program works just fine if I replace the
f32.minimum
inDistMatrix
with an equivalent expression usingfoldr
(but notreduce
).