c-scale-community / open-call-use-case-pangeo-julia

MIT License
1 stars 1 forks source link

Doing the tau_recurrence computation without the sparse matrix reduces the memory footprint #21

Open felixcremer opened 1 year ago

felixcremer commented 1 year ago

The switch to do the computation of the diagonal recurrence density seems to reduce the memory usaage by a lot, but it seems as if the cpus are not fully used and I am now IO bound for my analysis: These are the benchmarking results for the inner function for a single random pixel:

julia> @benchmark RQADeforestation.rqatrend_matrix(pix_t, ts)
BenchmarkTools.Trial: 6097 samples with 1 evaluation.
 Range (min … max):  651.647 μs …   5.323 ms  ┊ GC (min … max): 0.00% … 78.55%
 Time  (median):     720.816 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   814.240 μs ± 400.402 μs  ┊ GC (mean ± σ):  4.93% ±  8.49%

  ██▄▇▅▃       ▂                                                ▂
  ███████▅██▆▁███▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▅▆▅▅ █
  652 μs        Histogram: log(frequency) by time       3.82 ms <

 Memory estimate: 1.44 MiB, allocs estimate: 295.

julia> @benchmark RQADeforestation.rqatrend(pix_t, ts)
BenchmarkTools.Trial: 10000 samples with 1 evaluation.
 Range (min … max):  238.282 μs … 889.842 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     243.718 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   256.005 μs ±  52.062 μs  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █▇▆▅▄▂                                                        ▂
  █████████▇▄▅▅▇▇▆▅▇▇▅▅▆▇▇▅▅▇▆▆▅▄▇▆▅▆▅▃▅▇▅▅▅▃▄▃▆▆▅▅▄▁▃▅▃▅▅▄▆▄▄▄ █
  238 μs        Histogram: log(frequency) by time        556 μs <

 Memory estimate: 5.95 KiB, allocs estimate: 7.

Running the analysis on a single tile with a max_cache of 5e8 leads to a memory usage of 90Gb roughly, with loopchunks of 15000,20 but the CPUs seem to be very much underused. Is there a possibility to speed up the inner loop by using Threads inside? This is the current implementation:

function rqatrend(pix_trend, pix, thresh=2)
    #replace!(pix, -9999 => missing)
    ts = collect(skipmissing(pix))
    #@show length(ts)
    tau_pix = tau_recurrence(ts,thresh)
    pix_trend .= RA._trend(tau_pix)
end

function tau_recurrence(ts::AbstractVector, thresh, metric=Euclidean())
    n = length(ts)
    rr_τ = zeros(n)
    for col in 1:n
        for row in 1:(col - 1)
            d = evaluate(metric, ts[col], ts[row])
            #@show row, col, d
            rr_τ[col-row + 1] += d <= thresh
        end
    end
    rr_τ[1] = n
    rr_τ ./ (n:-1:1)
    #rr_τ
end
felixcremer commented 1 year ago

This is still using all available memory and is running into the EOFError even so we now have a VM with 256 GB of memory. This was the htop usage shortly before it crashed. It was running on full power and had an expected runtime of 31 minutes. image

felixcremer commented 1 year ago

I just reran it with 8 threads and 8 workers and it seems to be using all processing power at least in bursts and in between the cpu usage drops down, I suspect, when the new chunks are loaded. The memory usage is slowly increasing in small steps and in the beginning it was around 60 GB and it is slowely moving up with small increases and in between some drops in the memory usage.

felixcremer commented 1 year ago

In this setup the memory usage seems to be capped at 145 Gigabyte. The analysis went through just now.