Closed wuhuangjianCN closed 1 year ago
I get the feeling you want to optimize performance for a very specific use-case. If so, could you provide a minimum example that we can use for benchmarking?
As for your specific concerns:
running_median!(output, input, window_size, tapering=:symmetric)
function and put up a PR. It should probably throw an ArgumentError
if the length or data type of the output does not fit. Currently, output data type is always Float64. Other types would be possible but I'd save that for another issue. The relevant allocating code is here: https://github.com/Firionus/FastRunningMedian.jl/blob/338b6c5c9f9bdb93a0617de8ebfea37dc6c29ad3/src/FastRunningMedian.jl#L73shrink!
errors before removing the last element. Maybe put in an optional argument that allows you to circumvent that check, so you can shrink it to 0 elements? What happens when you then call median
on it? What happens when you call shrink
on a zero-size MedianFilter? All of that would probably be an easy Pull Request, and I'd appreciate it 👍 . running_median
on each, then bring them together again. Figuring out the boundaries/overlaps will be fun though. You can do all of that without modifying this library. If you have a good implementation, feel free to put in a PR. It might be of interest to others as well.I get the feeling you want to optimize performance for a very specific use-case. If so, could you provide a minimum example that we can use for benchmarking?
Sure, here is an example.
using FastRunningMedian
xall=rand(24*30*3,6,10000);
ftest(xall)=mapslices(x->running_median(x,24*15*2),xall,dims=1)
@time ftest(xall);
and the output is: 46.135678 seconds (2.21 M allocations: 6.005 GiB, 65.58% gc time)
2. Until then, you can just use the stateful API directly and re-use the MedianFilter. The only problem would be that you can't completely empty a MedianFilter right now, since
shrink!
errors before removing the last element. Maybe put in an optional argument that allows you to circumvent that check, so you can shrink it to 0 elements? What happens when you then callmedian
on it? What happens when you callshrink
on a zero-size MedianFilter? All of that would probably be an easy Pull Request, and I'd appreciate it 👍 .
I have added support for multiple series and submit a pull request. However, this is my first PR and I'm not sure if I have done it write. Let me know if you received the PR, thanks.
In the updated version, the times of allocation does not increase along with the number of series. The time consumption can be reduced by 50% when using muti-threads. Here is the test code:
using Revise
using DataStructures
includet("../src/FastRunningMedian.jl")
using Main.FastRunningMedian
X_all=rand(24*30*3,6,10000)
wSize=24*15*2+1
f1(X_all,wSize)=mapslices(x->running_median(x,wSize),X_all,dims=1);
f2(X_all,wSize)=running_median(X_all,wSize);
function f3(X_all,wSize)
Y_all=similar(X_all)
Threads.@threads for n_var in axes(X_all,2)
Y_all[:,n_var,:]=mapslices(x->running_median(x,wSize),view(X_all,:,n_var,:),dims=1);
end
return Y_all
end
function f4(X_all,wSize)
Y_all=similar(X_all)
Threads.@threads for n_var in axes(X_all,2)
Y_all[:,n_var,:]=running_median(view(X_all,:,n_var,:),wSize);
end
return Y_all
end
@time Y_base=f1(X_all,wSize); # single series
# 13.929382 seconds (2.21 M allocations: 6.009 GiB, 6.55% gc time)
@time Y_test=f2(X_all,wSize); # multiple series
# 13.338668 seconds (65 allocations: 1.333 GiB, 1.80% gc time)
@time Y_par=f3(X_all,wSize); # single series with mutiple threads
# 5.952394 seconds (2.27 M allocations: 6.980 GiB, 47.94% gc time)
@time Y_par=f4(X_all,wSize); # multiple series with mutiple threads
# 2.810954 seconds (383 allocations: 2.367 GiB, 0.17% gc time)
println("Y_base == Y_test ? ", Y_base == Y_test )
# Y_base == Y_test ? true
About the example: I can't reproduce that large GC percentage on my laptop. Here are my results on Julia 1.9 (please ignore the exact timings, they are imprecise due to CPU temperature and boosting):
julia> @time ftest(xall);
11.904905 seconds (1.25 M allocations: 3.918 GiB, 0.86% gc time)
julia> versioninfo()
Julia Version 1.9.0-rc1
Commit 3b2e0d8fbc1 (2023-03-07 07:51 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 8 × Intel(R) Core(TM) i7-10510U CPU @ 1.80GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-14.0.6 (ORCJIT, skylake)
Threads: 1 on 8 virtual cores
Similar with Julia 1.6.7:
julia> @time ftest(xall);
12.775667 seconds (3.17 M allocations: 6.352 GiB, 2.55% gc time)
and 1.2.0:
julia> @time ftest(xall);
12.329736 seconds (3.17 M allocations: 6.352 GiB, 2.48% gc time)
Not sure why we have such different GC behavior. :man_shrugging:
About your own code changes: I didn't get your pull request, unfortunately. It should be visible on this very repository if you click on "Pull requests" in the repository navigation bar. Mabye this guide can help you: https://opensource.com/article/19/7/create-pull-request-github
I'm looking forward to see what your API idea and code changes were, and see how we can incorporate them :blush:
Even if I didn't get your code yet, my curiosity was piqued. First, here is again your benchmark with v0.2:
julia> @time ftest(xall);
12.553681 seconds (1.97 M allocations: 5.998 GiB, 1.20% gc time)
Then I created the avoid-allocation
branch. The first improvement was to pre-size the heaps with sizehint!
. This already reduced allocations by about 30 %:
julia> @time ftest(xall);
11.873653 seconds (1.25 M allocations: 3.918 GiB, 0.62% gc time)
Then, I added running_median!
where the output and median filter can be provided by the user, further reducing allocations by 60 %:
function ftest2(xall)
output = Array{Float64, 3}(undef, (2159,6,10000))
mf = MedianFilter(0., 24*15*2)
for i in axes(xall,2), j in axes(xall,3)
running_median!(mf, @view(output[:,i,j]), @view(xall[:,i,j]))
end
output
end
julia> @time ftest2(xall);
11.480872 seconds (240.04 k allocations: 1.432 GiB, 0.06% gc time)
So overall, this new branch reduces allocations by over 70 %, leading to a slight performance improvement on my machine. And the results are the same of course:
julia> ftest(xall) == ftest2(xall)
true
Would you mind testing your use case with the avoid-allocation
branch and running_median!
?
I'd consider this done as of https://github.com/Firionus/FastRunningMedian.jl/commit/5527221f9c93ea779ef2227739c13190986ac6ad. There is a new function running_median!
which allows custom allocation for MedianFilter and the output. Also, running_median
and running_median!
will now take multidimensional arrays and calculate the running median independently over all columns. Pre-sizing of the heaps has also been included. I'll release the changes in the near future, but I might bundle in some more changes, since it comes with some breaking API improvements to the stateful API.
Thanks for your questions, and should you need anything else in the future, feel free to open another issue :smile:
It would be great to add a running_median!function to support pre-allocation of the outputs, because the gc time could account for 50% for a large amount of series. Or, add support for multiple series and enable multi-threads?