OpenMendel / SnpArrays.jl

Compressed storage for SNP data
https://openmendel.github.io/SnpArrays.jl/latest
Other
44 stars 9 forks source link

memory leak when copying/converting SnpArray to matrix #20

Closed Hua-Zhou closed 6 years ago

Hua-Zhou commented 6 years ago

Reported by @ericsobel and @biona001.

Following code

using SnpArrays
hapmap = SnpArray(Pkg.dir("SnpArrays") * "/docs/hapmap3")
n, snps = size(hapmap)
outv = zeros(n)
@time for s in 1:snps
    copy!(outv, hapmap[:, s], model=:additive, impute=false, center=false, scale=false)
end

shows suspiciously high memory allocation

  1.987208 seconds (9.32 M allocations: 147.676 MiB, 0.62% gc time)

Using @views doesn't help:

@time for s in 1:snps
    copy!(outv, hapmap[:, s], model=:additive, impute=false, center=false, scale=false)
end

yields

  2.007899 seconds (9.72 M allocations: 151.644 MiB, 0.61% gc time)

Converting to a matrix shows similar memory allocation:

outm = zeros(n, snps)
@time copy!(outm, hapmap)

yields

  0.161410 seconds (9.19 M allocations: 140.673 MiB, 6.21% gc time)

Machine information:

julia> versioninfo()
Julia Version 0.6.2
Commit d386e40c17 (2017-12-13 18:08 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i7-6920HQ CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

SnpArrays.jl version is v0.0.1 (d878a31).

Hua-Zhou commented 6 years ago

Some profiling shows the culprit might be this line https://github.com/OpenMendel/SnpArrays.jl/blob/d878a31b7ef9d118f3113da2ac1bc1f14b0a98e4/src/SnpArrays.jl#L248 If A is a SnpVector, then maf is a scalar. But if A is a SnpMatrix of dimension (n, 1), then maf is a vector with a single element. This causes type instability. maf is dynamically allocated at run time, causing unnecessary memory allocation

biona001 commented 6 years ago

congrats