Closed ehsantn closed 8 years ago
@ninegua pointed out that if the indmin line had arr[i,:] instead of arr[:,i] then you couldn't fuse even though the iterations spaces are the same. Of course, if x and y are different then arr[i,:] would be an incorrect program. Even if they were the same (as below) then you still have this issue that presumably you'd need to detect and not fuse in this case. @ninegua thinks the analysis is going to get complicated fast (and I agree) and on top of that the complication of splitting the 2D parfor in order to fuse it and the figuring out when the nests of the second parfor is identical to some subset of the nests of the first parfor makes this complicated. Still, this sounds like something we could do long term but I doubt by 4/10.
@acc function fuseme(N) arr = Float64[ i+j+0.3 for j in 1:N, i in 1:N] res = Int[ indmin(arr[i,:]) for i in 1:N] end
I talked to @ehsantn just now, there does seem to be a solution to this. If we translate indmin
into a parfor
loop, then the second comprehension becomes a two-level nested parfor
, and the first comprehension is a single parfor
with two loopNest
. If we attempt to break the first parfor
into a nested one, it could in theory fuse with the second, and then we have an outer parfor
with a body consisting of two adjacent parfor
s that potentially could fuse.
Of course there is a lot of details involved, like how the intermediate array (arr
in the above program) should be handled, and how we determine if the access pattern of the second loop allows the two outer loops to fuse, and what happens when the inner loops don't fuse (in which case I suspect we cannot get rid of arr
and hence make the entire thing less profitable).
Still lots of unknowns, but at least fusing parfor
with loopNests
with a nested parfor
sounds like an interesting topic to explore, if it helps to get rid of intermediate allocation.
Another thought just came across my mind, maybe it is even possible to turn nested parfor
into a single parfor
with two loopNest
? We may have to extend the reduction variable from scalar into vector, but doesn't look so far fetched.
@ninegua I was thinking about handling this in mmapInline()
phase since arr
is not used anywhere else. Is it possible?
As a short-term solution, I tried the code below. However, ParallelAccelerator chokes on some :boundscheck
nodes ($(Expr(:boundscheck, false))
and $(Expr(:boundscheck, :(Main.pop)))
). I tried @inbounds
but it didn't help.
using ParallelAccelerator
@acc function fuseme(x,y)
arr = [ Float64[i+j+0.3 for j in 1:x] for i in 1:y]
res = Int[ indmin(arr[i]) for i in 1:y]
end
fuseme(3,4)
I hacked domain-ir to remove those nodes to see what happens. It still doesn't fuse :(
Fixed in 0ce4ad6.
Fusion does three kinds of substitutions in the second parfor's body in the process of creating the fused parfor. 1) Replace the second parfor's indices with the firsts'. 2) If array X is created in the first parfor, replace arrayref(X,...) with an element instance variable corresponding to the current iteration's value of that element of X. 3) Replace references to array names as a whole in certain other circumstances.
The problem here was with the 2nd kind of substitution. It was only looking for arrayref(X,...) and unsafe_arrayref(X,...). In this particular case, it was ParallelAccelerator.API.getindex instance instead. I've added that as another case to look for and this example now seems to work.
Close if you are satisfied.
It works. Thank you, Todd.
The outer loops in the following code can be fused but ParallelAccelerator doesn't fuse them. Also,
arr[:,i]
is being copied to an intermediate array which is unnecessary.