OpenMendel / SnpArrays.jl

Compressed storage for SNP data
https://openmendel.github.io/SnpArrays.jl/latest
Other
44 stars 9 forks source link

Parallelize linear algebra routines for SnpArray #22

Closed biona001 closed 6 years ago

biona001 commented 6 years ago

ParallelSparseMatMul seemed to parallelized At_mul_B! for SparseMatrixCSC: https://github.com/madeleineudell/ParallelSparseMatMul.jl/blob/master/src/parallel_matmul.jl

I was experimenting with some of its ideas. It seems odd the below code sped up anything, but it does. Should I continue?

using SnpArrays, BenchmarkTools

#turns a SnpArray into a SharedArray{Tuple{Bool,Bool},2}
function share{T}(a::AbstractArray{T};kwargs...)
    sh = SharedArray{T}(size(a);kwargs...)
    for i=1:length(a)
        sh.s[i] = a[i]
    end
    return sh
end
share(A::SnpArray,pids::AbstractVector{Int}) = SharedSnpArray(share(A.A1,pids=pids),share(A.A2,pids=pids),pids)
share(A::SnpArray) = share(A::SnpArray,pids=workers())

#Constructor for SharedSnpArray
function SharedSnpArray(plinkFile::AbstractString) 
    share(SnpArray(plinkFile))
end

Check if SharedSnpArray can still do (SnpArray)-(vector) multiplication correctly (Yes)

x = SnpArray("hapmap3")
s = SharedSnpArray("hapmap3")
v = rand(13928)

julia> all(x*v .≈ s*v)
true

Is (SharedSnpArray)-(vector) faster than (SnpArray)-(vector)? (Yes)

julia> @benchmark x*v

BenchmarkTools.Trial:
  memory estimate:  5.38 KiB
  allocs estimate:  2
  --------------
  minimum time:     30.724 ms (0.00% GC)
  median time:      34.174 ms (0.00% GC)
  mean time:        33.786 ms (0.00% GC)
  maximum time:     39.272 ms (0.00% GC)
  --------------
  samples:          148
  evals/sample:     1

@benchmark s*v
BenchmarkTools.Trial:
  memory estimate:  2.69 KiB
  allocs estimate:  1
  --------------
  minimum time:     7.599 ms (0.00% GC)
  median time:      8.442 ms (0.00% GC)
  mean time:        8.905 ms (0.00% GC)
  maximum time:     12.584 ms (0.00% GC)
  --------------
  samples:          561
  evals/sample:     1

But a SharedSnpArray uses more memory than SnpArray: (why is it 16 and 80 bytes??)

julia> sizeof(x)
16

julia> sizeof(s)
80