Closed evilmav closed 4 years ago
Which version of Futhark are you using? With the version in Git, the compiler itself will crash with a compiler limitation. I think that's because I actually made a fix recently that made the compiler more aware of its own limitations, while before it would sometimes just generate invalid code. I think that is what is happening here.
What is scan_relaxed
supposed to do? How is it semantically different than a normal scan
? Anyway, here is a nicer way to write it that does not run into limitations of related to memory expansion, because all the sizes are known in advance:
let scan_relaxed [n] 'a 'b (f: a -> b -> a) (init: a) (items: [n]b): [n]a =
let states=
loop s = replicate n init for i in 1...n do
s with [i] = f (copy s[i-1]) items[i-1]
in states
It's still completely sequential, though.
Thank you for the quick response! I've used v0.18.1. I think it should be 1..<n, you've just saved my WE! =)
In short what I'm trying to do is to solve a Fokker-Planck equation over time, where a would be a tuple of (Fourier expansion of probability density, last input at a time) and b are the actual inputs of the system. The last map in thread
simply extracts the interesting part from the state. Each step f involves solving a problem using thomas algorithm etc and as far as I understand is not within restrictions of built-in parallelizable scan (or any way of parallelization I can think of).
But because I will commonly run this over a set of thousands bias points (ibias in fail()) at a time, this map over thread should allow for reasonable parallelization, should not it?
Sure, if the top level parallelism is sufficient, then it's not a problem that an inner loop is sequential. Futhark generates fairly tight code for sequential loops.
I confirm the current git version does detect the compiler limitation, though with a warning not mentioning "states", so will be rather hard to understand the source. ("Cannot handle un-sliceable allocation size: (_group (#groups=k_5079; groupsize=m_5078), bytes_5674, @local)"). Can be closed...
If I may ask a stupid question: in the example above, I save states in the scan, but use only a fraction of the state in the lambda expression of the following map. Will this generally result in an actual temporary buffer of the states, or is the compiler miraculously smart enough to combine the following map back into it and only keep the output of the lambda in buffer?
It will compute the full temporary buffer. Futhark generally doesn't have any optimisations that change the asymptotics of your program (they tend to be brittle in the practice and result in unpredictable performance cliffs).
Those compiler limitation errors really suck, and many of them are not very helpful. Fortunately, the kind of program you wrote here is about the only case where they show up. If you really need irregular allocations (you don't here), you can always use the multicore
backend, which does not have the same limitations as the GPU backends, and can handle anything.
When running
fail
in the following code from python, I get "NameError: name 'bytes_6155' is not defined" in the python module, which corresponds to LocalMemory size inset_args
for the kernel. I've tried to boil down a smaller example from a larger code as far as I could, so the following does not have to make sense:Interestingly, if I replace
let blasums
with the commented version, compiler will break up with encountered known limitation.PS I have to try to make it work soon, if you could suggest a workaround I can try in the pattern, would be very appreciated.
PSS There has to be a less messy way to implement scan_relaxed...