JuliaParallel / ScaLAPACK.jl

Wrap ScaLAPACK in Julia
Other
7 stars 6 forks source link

pdgemm #2

Open ViralBShah opened 9 years ago

ViralBShah commented 9 years ago

Would it be possible to hook up pdgemm?

It would be nice to compare a Julia SUMMA implementation with the one in scalapack/elemental.

andreasnoack commented 9 years ago

I can do that. It shouldn't be that hard. I have also just figured out how the redistribute functions in ScaLAPACK work so it might also be possible to use this from DArrays and still get a reasonable performance.

But I don't know what a SUMMA is.

ViralBShah commented 9 years ago

Thanks. SUMMA is an outer product formulation of matrix multiply that is efficient in parallel.

http://www.netlib.org/lapack/lawnspdf/lawn96.pdf

andreasnoack commented 9 years ago

I've pushed some wrapper code to the anj/gemm branch so you can try it out if you'd like. If you want to try it out with DArrays you'd have to merge my anj/darray Julia branch first because the DArrays have to be laid out in a certain way. However, you don't need to pay much attention to how they are laid out after that, because the wrapper redistributes back and forth on the fly. Hence you can do

julia> using MPI

julia> @everywhere using ScaLAPACK

julia> manager = MPIManager(np = 64)
MPI.MPIManager(64,`mpirun -np 64 --output-filename /tmp/user/1021/juliaUhh3oE`,"/tmp/user/1021/juliaUhh3oE",60,Dict{Int64,Int64}(),Dict{Int64,Int64}(),RemoteRef(1,1,7852),false)

julia> addprocs(manager);

julia> @everywhere using ScaLAPACK

julia> A = drandn(5000,5000);

julia> B = drandn(5000,5000);

julia> C = dzeros(5000,5000);

julia> @time ScaLAPACK.A_mul_B!(1.0, A, B, 0.0, C, 100, 100);
elapsed time: 3.871655318 seconds (8 MB allocated)

The last two arguments are the row and column size of the blocks in the block-cyclic distributions.

ViralBShah commented 9 years ago

Cc: @amitmurthy