Open ViralBShah opened 9 years ago
I can do that. It shouldn't be that hard. I have also just figured out how the redistribute functions in ScaLAPACK work so it might also be possible to use this from DArray
s and still get a reasonable performance.
But I don't know what a SUMMA is.
Thanks. SUMMA is an outer product formulation of matrix multiply that is efficient in parallel.
I've pushed some wrapper code to the anj/gemm
branch so you can try it out if you'd like. If you want to try it out with DArray
s you'd have to merge my anj/darray
Julia branch first because the DArray
s have to be laid out in a certain way. However, you don't need to pay much attention to how they are laid out after that, because the wrapper redistributes back and forth on the fly. Hence you can do
julia> using MPI
julia> @everywhere using ScaLAPACK
julia> manager = MPIManager(np = 64)
MPI.MPIManager(64,`mpirun -np 64 --output-filename /tmp/user/1021/juliaUhh3oE`,"/tmp/user/1021/juliaUhh3oE",60,Dict{Int64,Int64}(),Dict{Int64,Int64}(),RemoteRef(1,1,7852),false)
julia> addprocs(manager);
julia> @everywhere using ScaLAPACK
julia> A = drandn(5000,5000);
julia> B = drandn(5000,5000);
julia> C = dzeros(5000,5000);
julia> @time ScaLAPACK.A_mul_B!(1.0, A, B, 0.0, C, 100, 100);
elapsed time: 3.871655318 seconds (8 MB allocated)
The last two arguments are the row and column size of the blocks in the block-cyclic distributions.
Cc: @amitmurthy
Would it be possible to hook up
pdgemm
?It would be nice to compare a Julia SUMMA implementation with the one in scalapack/elemental.