Open bdeonovic opened 9 years ago
This would certainly be nice to have, and shouldn't be too difficult. We could start by just parallelizing the outermost loop, but in cases where there isn't enough parallelism there, we should also be able to parallelise across loop nest levels.
Cc: @amitmurthy
See the recent work in JuliaLang/julia#9871 for something that's pretty close to what you need.
Relevent (but not sure how useful), OpenMP has a collapse
keyword for, well, collapsing nested parallel loops.
@timholy I missed JuliaLang/julia#9871. Thanks!
I am a bit confused on what @simd
actually does. Can it be used to do what I want in my original post?
@simd
is for exploiting vector arithmetic units, not multiple threads. It's currently limited to a single loop. Though with PR JuliaLang/julia#9876 you can use it on Cartesian ranges, albeit it just vectorizes the fastest index instead of collapsing/flattening the range.
On modern processors, best results are often obtained by multi-threading the outer loop(s) and vectorizing the innermost loop.
Any news on this? It appears that @parallel
still doesn't accept one line nested loops, or is there a way to achieve this by now?
You can do this manually today. Just do it over the outer loop only, or use ind2sub
to convert between an overall linear index and the multidimensional/cartesian index you want.
Sorry, but I'm not sure I understand your suggestion. Would your first suggestion be to convert
@parallel for i=1:10, j=1:10, k=1:10
...
end
into
@parallel for i=1:10
for j=1:10, k=1:10
...
end
end
And I'm afraid that after reading the relevant entry in the docs I have no idea what ind2sub
is doing exactly.
@timholy Would eachindex
also be a reasonable alternative?
@nilshg The ind2sub
approach:
@parallel for i in 1:length(A)
indices = ind2sub(size(A), i)
@show A[indices...]
end
(Since I'm using @show
this will be a mess, but I hope it gets the point across.)
Any movement on this? I just ran into this today and it would be very nice to have this feature.
I am looking into the same problem. I have 2 for nested loops and I want to parallelize first for loop over workers and second loop over threads? Is this possible.
A possible workaround:
@parallel for (i,j) in collect(Iterators.product(1:10, 1:20))
....
end
I think the example in the docs on using shared arrays in parallel computing gives some demonstrations on how to go about parallizing multiple iterators.
@TAJD where exactly?
I think I misunderstood the problem. But I found the notes on using SharedArrays helpful for parallizing my own simulations.
Maybe the workaround I suggested earlier:
@distributed for (i,j) in collect(Iterators.product(1:10, 1:20))
....
end
could be the default behavior of @distributed
when it encounters a multi-dimensional loop? Unfortunately I don't know enough about macros as to prepare a PR myself with this approach.
Also it is not clear what the reducer would do in this case.
I wanted to run a parallel nested for loop. In the docs it shows how you can do a nested for loop into a single outer loop:
and an example for parallel for loop:
How can I combine these concepts? I tried something along the lines of:
but I get:
Also tried with Iterators package:
I am on: