Closed alanedelman closed 4 years ago
Was reading http://hpc.pnl.gov/globalarrays/papers/GA-UserManual-Main.pdf recently. They suggest (on page 9) that stencils would be a case where the DArrays are not a good fit.
perhaps because there was less flexibliity is that implementation? a global array is just a data structure on the connection machine , stencils were super fast on global arrays lots more can be said
Yes, I'm surprised by that statement. Sundials' parallel N_Vector seems very similar to DArrays. At least for making DArrays work with a finite difference PDE solver well, we just need stencil calculations and in-place broadcast.
I'll wait on broadcast since I know that's changing in v1.0, but I was thinking that stencil computations would be a great GSoC project. The library DiffEqOperators.jl automatically builds lazy arbitrary order stencils for PDE operators, and so what's left is parallelizing that well. I thought it would be as simple as making the DArray send the data at the ends of the vector asynchronously with its computation of the stencil on the local array. @andreasnoack if you can clarify why that wouldn't work out well then that could help. If we need to, we can build a parallel array over MPI or get native Julia codes working with PETSc vectors, but I'd like to try to make this pure-Julia.
I think the main concern is that we currently don't optimise regular access patterns well. One would want to distribute the data to minimise the amount of communication for stencil and do pre-fetching.
There are multiple avenues for improving, but having a high-level way to express data-access patterns for stencils would be great. (I want to have the broadcast
version of stencils, data-access pattern + kernel. So that we can do GPUs as well.)
+1
Would be great to be able to do big stencils (constant coefficient, non-constant coefficient) with DARRAYS