JuliaImages / ImageFiltering.jl

Julia implementations of multidimensional array convolution and nonlinear stencil operations
Other
99 stars 49 forks source link

imfilter huge perf drop when #161

Closed wizebt closed 4 years ago

wizebt commented 4 years ago

imfilter use for gaussian filter in my astro image processing becomes randomly ~10x slower. The following test function can reproduce the issue. At this stage I'm not able to investigate further nor understand the root cause. can you please help. Thanks

Test environment, same behavior in 1.3 or 1.3.1. or on different machine

julia> versioninfo()
Julia Version 1.4.0
Commit b8e9a9ecc6 (2020-03-21 16:36 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-4940MX CPU @ 3.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, haswell)
Environment:
  JULIA_NUM_THREADS = 4
function test_imfilter()
    gk = Kernel.gaussian(2.0f0)
    I = rand(Float32,1000,2000)

    for i = 1:9
        start = time()
        J = Images.imfilter(I,gk)
        t = round(time() - start, digits=6)
        t > 1 && println("$i imfilter $t sec")
    end
end

Now testing, nothing else was running on the laptop

julia> include("test_imfilter.jl")
f! (generic function with 1 method)

julia> using BenchmarkTools

julia> @benchmark test_imfilter()
BenchmarkTools.Trial: 
  memory estimate:  147.52 MiB
  allocs estimate:  23748
  --------------
  minimum time:     177.292 ms (1.07% GC)
  median time:      183.574 ms (1.07% GC)
  mean time:        190.164 ms (4.57% GC)
  maximum time:     289.147 ms (37.44% GC)
  --------------
  samples:          27
  evals/sample:     1

Even with lots of GC, ~0.3 sec When use in conjunction with ImageView or Gtk package, perf drop 10 to 20x even with little GC

julia> using ImageView
Gtk-Message: 21:35:38.199: Failed to load module "canberra-gtk-module"
Gtk-Message: 21:35:38.200: Failed to load module "canberra-gtk-module"

julia> @benchmark test_imfilter()
1 imfilter 5.023953 sec
3 imfilter 5.037526 sec
1 imfilter 5.023682 sec
2 imfilter 5.031469 sec
3 imfilter 5.029917 sec
4 imfilter 5.030758 sec
6 imfilter 5.020171 sec
8 imfilter 4.395933 sec
1 imfilter 5.022895 sec
2 imfilter 5.026384 sec
3 imfilter 5.030159 sec
5 imfilter 5.023577 sec
7 imfilter 5.02138 sec
1 imfilter 5.023868 sec
5 imfilter 5.023962 sec
BenchmarkTools.Trial: 
  memory estimate:  147.52 MiB
  allocs estimate:  23851
  --------------
  minimum time:     10.196 s (0.02% GC)
  median time:      10.196 s (0.02% GC)
  mean time:        10.196 s (0.02% GC)
  maximum time:     10.196 s (0.02% GC)
  --------------
  samples:          1
  evals/sample:     1
timholy commented 4 years ago

I can't replicate this, and therefore can't fix it. If you were on Windows I'd wonder if it were https://github.com/JuliaGraphics/Gtk.jl/issues/325, but your versioninfo cited Linux.

wizebt commented 4 years ago

Hi Tim, First thanks for you rapid reply and congrat for your brilliant implementation of imfilter. The same test doesn't reproduce on Windows 10. Sorry that you were not able to reproduce but it is present solid on Ubuntu 18.10 (3 diff machine).

manu@P7530:~$ uname -a
Linux P7530 5.3.0-42-generic #34~18.04.1-Ubuntu SMP Fri Feb 28 13:42:26 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

I strongly suspect some internal nasty interference with the Gtk package but at this point I don't know how investigate deeper into the code to circumvent where it gets slower, on the CPU activity during bench there is no noticeable movement. It is so slow that today I wrote an optimized gaussian filter for small kernel 5x5 that suits my need in the interim. Can't figure out where this sleep or wait time is coming from when Gtk is in use it's beyond my practice of Julia. Eager to solve this issue how would you find out?

The key to reproduce is using ImageView or Gtk Below benchmark results imfilter then my gaussian filter

julia> @benchmark imfilter($I,$gk1)
BenchmarkTools.Trial: 
  memory estimate:  145.48 KiB
  allocs estimate:  106
  --------------
  minimum time:     56.675 μs (0.00% GC)
  median time:      94.693 μs (0.00% GC)
  mean time:        20.727 ms (0.01% GC)
  maximum time:     4.993 s (0.00% GC)
  --------------
  samples:          464
  evals/sample:     1

julia> @benchmark imfilter($I,$gk1)
BenchmarkTools.Trial: 
  memory estimate:  145.48 KiB
  allocs estimate:  106
  --------------
  minimum time:     65.363 μs (0.00% GC)
  median time:      88.588 μs (0.00% GC)
  mean time:        24.813 ms (0.01% GC)
  maximum time:     5.005 s (0.00% GC)
  --------------
  samples:          387
  evals/sample:     1

julia> @benchmark gfilter($I)
BenchmarkTools.Trial: 
  memory estimate:  37.75 KiB
  allocs estimate:  5
  --------------
  minimum time:     2.026 μs (0.00% GC)
  median time:      2.863 μs (0.00% GC)
  mean time:        3.362 μs (12.59% GC)
  maximum time:     112.581 μs (88.92% GC)
  --------------
  samples:          10000
  evals/sample:     9
timholy commented 4 years ago

I'm not doubting your results, I'm just saying that since I can't replicate it, it will probably be up to you to make some progress diagnosing the problem. (Start with ProfileView.)

Just to show you what I get, here's a session on my machine:

julia> versioninfo()
Julia Version 1.4.0
Commit b8e9a9ecc6* (2020-03-21 16:36 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Core(TM) i7-8850H CPU @ 2.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
  JULIA_CPU_THREADS = 4

(@v1.4) pkg> st ImageFiltering
Status `~/.julia/environments/v1.4/Project.toml`
  [6a3955dd] ImageFiltering v0.6.11

(@v1.4) pkg> st Gtk
Status `~/.julia/environments/v1.4/Project.toml`
  [4c0ca9eb] Gtk v1.1.3

julia> using ImageFiltering
[ Info: Precompiling ImageFiltering [6a3955dd-da59-5b1f-98d4-e7296123deb5]

julia> function test_imfilter()
               gk = Kernel.gaussian(2.0f0)
               I = rand(Float32,1000,2000)

               for i = 1:9
                       start = time()
                       J = imfilter(I,gk)
                       t = round(time() - start, digits=6)
                       t > 1 && println("$i imfilter $t sec")
               end
       end
test_imfilter (generic function with 1 method)

julia> test_imfilter()

julia> @time test_imfilter()
  0.373659 seconds (12.98 k allocations: 146.577 MiB, 1.73% gc time)

julia> @time test_imfilter()
  0.375970 seconds (12.98 k allocations: 146.577 MiB, 0.94% gc time)

julia> @time test_imfilter()
  0.370272 seconds (12.98 k allocations: 146.577 MiB, 0.52% gc time)

julia> using Gtk
Gtk-Message: 03:45:59.435: Failed to load module "canberra-gtk-module"
Gtk-Message: 03:45:59.436: Failed to load module "canberra-gtk-module"

julia> @time test_imfilter()
  0.365696 seconds (12.98 k allocations: 146.577 MiB, 2.15% gc time)

julia> @time test_imfilter()
  0.375604 seconds (12.98 k allocations: 146.577 MiB, 0.44% gc time)

julia> @time test_imfilter()
  0.381811 seconds (12.98 k allocations: 146.577 MiB, 0.80% gc time)

How did you get Julia? I built mine from source, did you download as a binary?

wizebt commented 4 years ago

My Julia 1.4.0 was downloaded from binary. To narrow down this issue, and get close to your env. I downloaded the source, updated Ubuntu for preprequisit (only cmake was updated) and ran the test from this env. First only ImageFiltering pkg is use (previously was Images) then use Gtk (weird Gtk msg that I cannot get rid of) Here is the trace

manu@P7530:~/julia-1.4.0-full/julia-1.4.0$ ./julia 
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.4.0 (2020-03-21)
 _/ |\__'_|_|_|\__'_|  |  
|__/                   |

julia> versioninfo()
Julia Version 1.4.0
Commit b8e9a9ecc6 (2020-03-21 16:36 UTC)
Platform Info:
  OS: Linux (x86_64-linux-gnu)
  CPU: Intel(R) Xeon(R) E-2186M  CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6

julia> using ImageFiltering
[ Info: Precompiling ImageFiltering [6a3955dd-da59-5b1f-98d4-e7296123deb5]

julia> function test_imfilter()
               gk = Kernel.gaussian(2.0f0)
               I = rand(Float32,1000,2000)

               for i = 1:9
                       start = time()
                       J = imfilter(I,gk)
                       t = round(time() - start, digits=6)
                       t > 1 && println("$i imfilter $t sec")
               end
       end
test_imfilter (generic function with 1 method)

julia> @time test_imfilter()
  0.469421 seconds (575.55 k allocations: 173.657 MiB, 23.22% gc time)

julia> @time test_imfilter()
  0.158133 seconds (23.86 k allocations: 147.708 MiB, 28.82% gc time)

julia> @time test_imfilter()
  0.131905 seconds (23.86 k allocations: 147.708 MiB, 2.47% gc time)

julia> using Gtk
[ Info: Precompiling Gtk [4c0ca9eb-093a-5379-98c5-f87ac0bbbf44]
Gtk-Message: 12:08:31.030: Failed to load module "canberra-gtk-module"
Gtk-Message: 12:08:31.031: Failed to load module "canberra-gtk-module"

julia> @time test_imfilter()
2 imfilter 2.863516 sec
5 imfilter 4.938196 sec
6 imfilter 5.000571 sec
 13.005665 seconds (23.98 k allocations: 147.712 MiB, 0.70% gc time)

So issue is still present using build 1.4.0 and ImageFiltering using Profile pkg there no delay

julia> using Profile

julia> @profile test_imfilter()

julia> Profile.print()
Overhead ╎ [+additional indent] Count File:Line; Function
=========================================================
    ╎35   @Base/task.jl:358; (::REPL.var"#26#27"{REPL.REPLBa...
    ╎ 35   @REPL/src/REPL.jl:118; macro expansion
    ╎  35   @REPL/src/REPL.jl:86; eval_user_input(::Any, ::REPL....
   5╎   35   @Base/boot.jl:331; eval(::Module, ::Any)
    ╎    4    REPL[3]:3; test_imfilter()
    ╎     4    ...dom/src/Random.jl:292; rand
    ╎    ╎ 4    ...dom/src/Random.jl:289; rand
    ╎    ╎  4    ...dom/src/Random.jl:288; rand(::Random.MersenneTwist...
    ╎    ╎   4    ...om/src/Random.jl:267; rand!
    ╎    ╎    3    @Random/src/RNGs.jl:551; rand!
    ╎    ╎     3    ...m/src/Random.jl:266; rand!
    ╎    ╎    ╎ 1    ...dom/src/RNGs.jl:525; rand!(::Random.MersenneT...
    ╎    ╎    ╎  1    ...dom/src/RNGs.jl:506; _rand!
    ╎    ╎    ╎   1    ...om/src/RNGs.jl:494; rand!(::Random.Mersenne...
    ╎    ╎    ╎    1    ...om/src/RNGs.jl:471; fill_array!
   1╎    ╎    ╎     1    ...m/src/DSFMT.jl:86; dsfmt_fill_array_close...
    ╎    ╎    ╎ 1    ...dom/src/RNGs.jl:530; rand!(::Random.MersenneT...
   1╎    ╎    ╎  1    @Base/int.jl:301; xor
    ╎    ╎    ╎ 1    ...dom/src/RNGs.jl:541; rand!(::Random.MersenneT...
    ╎    ╎    ╎  1    ...dom/src/RNGs.jl:518; mask128
   1╎    ╎    ╎   1    @Base/int.jl:277; &
    ╎    ╎    1    @Random/src/RNGs.jl:554; rand!
   1╎    ╎     1    @Base/float.jl:402; -
    ╎    26   REPL[3]:7; test_imfilter()
    ╎     26   ...ng/src/imfilter.jl:5; imfilter
    ╎    ╎ 26   ...g/src/imfilter.jl:10; imfilter
    ╎    ╎  26   ...g/src/imfilter.jl:18; imfilter
    ╎    ╎   26   ...g/src/imfilter.jl:22; imfilter
    ╎    ╎    26   .../src/imfilter.jl:27; imfilter
    ╎    ╎     7    ...g/src/border.jl:1015; allocate_output
    ╎    ╎    ╎ 7    @Base/array.jl:359; similar
   7╎    ╎    ╎  7    @Base/boot.jl:407; Array
    ╎    ╎     19   ...src/imfilter.jl:606; imfilter!(::Array{Float3...
    ╎    ╎    ╎ 19   ...src/imfilter.jl:612; imfilter!(::Array{Float3...
    ╎    ╎    ╎  19   ...src/imfilter.jl:702; imfilter!(::Computationa...
    ╎    ╎    ╎   18   ...rc/imfilter.jl:711; imfilter!
    ╎    ╎    ╎    1    .../src/border.jl:662; padarray(::Type{Float3...
    ╎    ╎    ╎     1    ...tractarray.jl:625; similar
    ╎    ╎    ╎    ╎ 1    ...orderarray.jl:77; similar
    ╎    ╎    ╎    ╎  1    ...fsetArrays.jl:102; similar
    ╎    ╎    ╎    ╎   1    @Base/array.jl:361; similar
    ╎    ╎    ╎    ╎    1    @Base/boot.jl:415; Array
   1╎    ╎    ╎    ╎     1    @Base/boot.jl:407; Array
    ╎    ╎    ╎    17   .../src/border.jl:663; padarray(::Type{Float3...
    ╎    ╎    ╎     17   ...orderarray.jl:161; copy!(::OffsetArrays.O...
    ╎    ╎    ╎    ╎ 17   ...orderarray.jl:175; _copy!
    ╎    ╎    ╎    ╎  17   ...src/border.jl:685; copydata!
   3╎    ╎    ╎    ╎   3    @Base/array.jl:0; copydata!(::Array{Fl...
    ╎    ╎    ╎    ╎   14   ...rc/border.jl:678; copydata!(::Array{Fl...
    ╎    ╎    ╎    ╎    7    ...mensional.jl:543; getindex
   7╎    ╎    ╎    ╎     7    @Base/array.jl:788; getindex
    ╎    ╎    ╎    ╎    5    ...mensional.jl:545; setindex!
   5╎    ╎    ╎    ╎     5    @Base/array.jl:827; setindex!
    ╎    ╎    ╎    ╎    2    @Base/range.jl:597; iterate
   2╎    ╎    ╎    ╎     2    ...romotion.jl:398; ==
    ╎    ╎    ╎   1    ...rc/imfilter.jl:715; imfilter!
    ╎    ╎    ╎    1    ...rc/imfilter.jl:770; imfilter!
    ╎    ╎    ╎     1    ...c/imfilter.jl:773; imfilter!(::Computatio...
    ╎    ╎    ╎    ╎ 1    ...c/imfilter.jl:854; _imfilter_tiled!(::Co...
    ╎    ╎    ╎    ╎  1    ...c/imfilter.jl:858; _imfilter_tiled_threa...
    ╎    ╎    ╎    ╎   1    ...constructs.jl:69; macro expansion
    ╎    ╎    ╎    ╎    1    @Base/task.jl:554; schedule(::Task)
   1╎    ╎    ╎    ╎     1    @Base/task.jl:550; enq_work(::Task)
    ╎315  @Base/task.jl:444; task_done_hook(::Task)
    ╎ 315  @Base/task.jl:709; wait()
 314╎  315  @Base/task.jl:702; poptaskref(::Base.InvasiveLink...
    ╎610  ...eadingconstructs.jl:28; (::ImageFiltering.var"#71#threa...
    ╎ 610  ...eadingconstructs.jl:61; (::ImageFiltering.var"#71#threa...
    ╎  203  ...ing/src/imfilter.jl:862; macro expansion
   1╎   1    ...ng/src/imfilter.jl:960; imfilter!(::ComputationalReso...
    ╎   202  ...ng/src/imfilter.jl:978; imfilter!(::ComputationalReso...
    ╎    202  ...ng/src/imfilter.jl:994; _imfilter_inbounds!(::Comput...
  16╎     16   ...g/src/imfilter.jl:0; __imfilter_inbounds!(::Comp...
    ╎     9    ...g/src/imfilter.jl:1016; __imfilter_inbounds!(::Comp...
    ╎    ╎ 9    .../abstractarray.jl:75; axes
    ╎    ╎  9    @Base/tuple.jl:158; map
    ╎    ╎   9    @Base/range.jl:320; OneTo
    ╎    ╎    9    @Base/range.jl:311; OneTo
   9╎    ╎     9    @Base/promotion.jl:409; max
   1╎     142  ...g/src/imfilter.jl:1025; __imfilter_inbounds!(::Comp...
   1╎    ╎ 1    @Base/float.jl:404; *
  41╎    ╎ 41   @Base/float.jl:400; +
    ╎    ╎ 23   ...ltidimensional.jl:543; getindex
  23╎    ╎  23   @Base/array.jl:788; getindex
    ╎    ╎ 5    ...ltidimensional.jl:347; iterate
    ╎    ╎  5    ...ltidimensional.jl:361; __inc
   5╎    ╎   5    @Base/int.jl:49; <
    ╎    ╎ 71   @Base/range.jl:597; iterate
  71╎    ╎  71   @Base/promotion.jl:398; ==
    ╎     35   ...g/src/imfilter.jl:1027; __imfilter_inbounds!(::Comp...
    ╎    ╎ 22   .../abstractarray.jl:1073; setindex!
    ╎    ╎  22   ...abstractarray.jl:1103; _setindex!
    ╎    ╎   22   ...iledIteration.jl:202; setindex!
    ╎    ╎    22   .../OffsetArrays.jl:159; setindex!
  17╎    ╎     17   @Base/array.jl:827; setindex!
    ╎    ╎     5    ...OffsetArrays.jl:86; parent
   5╎    ╎    ╎ 5    @Base/Base.jl:33; getproperty
   7╎    ╎ 7    @Base/int.jl:52; -
    ╎    ╎ 6    @Base/range.jl:597; iterate
   6╎    ╎  6    @Base/promotion.jl:398; ==
    ╎  407  ...ing/src/imfilter.jl:863; macro expansion
    ╎   407  ...ng/src/imfilter.jl:978; imfilter!(::ComputationalReso...
    ╎    407  ...ng/src/imfilter.jl:994; _imfilter_inbounds!
  13╎     13   ...g/src/imfilter.jl:0; __imfilter_inbounds!(::Comp...
    ╎     12   ...g/src/imfilter.jl:1001; __imfilter_inbounds!(::Comp...
    ╎    ╎ 5    ...ltidimensional.jl:340; iterate
    ╎    ╎  5    ...ltidimensional.jl:382; last
    ╎    ╎   5    @Base/tuple.jl:157; map
    ╎    ╎    5    ...rays/src/axes.jl:151; last
   5╎    ╎     5    @Base/int.jl:53; +
   7╎    ╎ 7    ...rrays/src/axes.jl:141; iterate
  22╎     319  ...g/src/imfilter.jl:1002; __imfilter_inbounds!(::Comp...
    ╎    ╎ 11   .../abstractarray.jl:980; getindex
    ╎    ╎  11   ...abstractarray.jl:1020; _getindex
    ╎    ╎   11   ...iledIteration.jl:200; getindex
    ╎    ╎    11   .../OffsetArrays.jl:150; getindex
   9╎    ╎     9    @Base/array.jl:788; getindex
   2╎    ╎     2    @Base/int.jl:53; getindex
  25╎    ╎ 25   @Base/float.jl:404; *
  36╎    ╎ 36   @Base/float.jl:400; +
 143╎    ╎ 143  @Base/int.jl:53; +
    ╎    ╎ 17   ...ltidimensional.jl:347; iterate
    ╎    ╎  12   ...ltidimensional.jl:361; __inc
  12╎    ╎   12   @Base/int.jl:49; <
    ╎    ╎  5    ...ltidimensional.jl:362; __inc
   5╎    ╎   5    @Base/int.jl:53; +
    ╎    ╎ 65   ...rrays/src/axes.jl:145; iterate
    ╎    ╎  65   @Base/range.jl:597; iterate
  65╎    ╎   65   @Base/promotion.jl:398; ==
    ╎     13   ...g/src/imfilter.jl:1004; __imfilter_inbounds!(::Comp...
    ╎    ╎ 1    ...ltidimensional.jl:347; iterate
    ╎    ╎  1    ...ltidimensional.jl:362; __inc
   1╎    ╎   1    @Base/int.jl:53; +
    ╎    ╎ 10   ...ltidimensional.jl:545; setindex!
  10╎    ╎  10   @Base/array.jl:827; setindex!
    ╎    ╎ 2    @Base/range.jl:597; iterate
   2╎    ╎  2    @Base/promotion.jl:398; ==
  50╎     50   ...rrays/src/axes.jl:0; __imfilter_inbounds!(::Comp...
    ╎36   @Gtk/src/events.jl:1; gtk_main()
    ╎ 36   ...src/GLib/signals.jl:176; g_sigatom(::Any)
  36╎  36   @Gtk/src/events.jl:2; (::Gtk.var"#237#238")()
Total snapshots: 1999

julia> @profile test_imfilter()

julia> test_imfilter()
5 imfilter 2.691614 sec
6 imfilter 4.999978 sec

at the en of the trace, there are a few lines related to Gtk but different from one execution to the other?

 155╎     155  ...setArrays/src/axes.jl:0; __imfilter_inbounds!(::Computation...
    ╎120  @Gtk/src/events.jl:1; gtk_main()
    ╎ 120  @Gtk/src/GLib/signals.jl:176; g_sigatom(::Any)
 119╎  120  @Gtk/src/events.jl:2; (::Gtk.var"#237#238")()
    ╎   1    @Gtk/src/GLib/signals.jl:333; uv_dispatch(::Ptr{Nothing}, ::Ptr{No...
    ╎    1    @Gtk/src/GLib/signals.jl:237; g_yield(::UInt64)
   1╎     1    @Base/task.jl:608; yield()
Total snapshots: 7627

Another trace tail

 336╎     336   ...setArrays/src/axes.jl:0; __imfilter_inbounds!(::Computatio...
    ╎1     @Gtk/src/GLib/signals.jl:295; uv_prepare(::Ptr{Nothing}, ::Ptr{Int32})
   1╎ 1     @Base/Base.jl:33; getproperty(::Gtk.GLib._GPollFD, ::Sy...
    ╎219   @Gtk/src/events.jl:1; gtk_main()
    ╎ 219   @Gtk/src/GLib/signals.jl:176; g_sigatom(::Any)
 217╎  219   @Gtk/src/events.jl:2; (::Gtk.var"#237#238")()
    ╎   1     @Gtk/src/GLib/signals.jl:333; uv_dispatch(::Ptr{Nothing}, ::Ptr{N...
    ╎    1     @Gtk/src/GLib/signals.jl:237; g_yield(::UInt64)
   1╎     1     @Base/task.jl:608; yield()
   1╎   1     @Gtk/src/GLib/signals.jl:295; uv_prepare(::Ptr{Nothing}, ::Ptr{In...
Total snapshots: 13527
timholy commented 4 years ago

To explain the phenomenon there must be some missing traces. On Linux there's ~1 sample/millisecond, so I would expect you'd get ~150 snapshots. Try Profile.clear() between runs and don't profile the first call. Then after loading Gtk you should get a huge increase in the number of traces somewhere in the pipeline. You might need to display them with the C=true option.

wizebt commented 4 years ago

Trace are different without and with Gtk, after 3x @profile test_imfilter() tail of Profile.print()

     ╎    ╎    ╎    ╎    ╎    1     @Base/compiler/inferenceresult.jl:12; InferenceResult
     ╎    ╎    ╎    ╎    ╎     1     @Base/compiler/inferenceresult.jl:47; matching_cache_argtypes(::Core.MethodInstance, :...
     ╎    ╎    ╎    ╎    ╎    ╎ 1     @Base/compiler/inferenceresult.jl:129; matching_cache_argtypes(::Core.MethodInstance, ...
    1╎    ╎    ╎    ╎    ╎    ╎  1     @Base/compiler/typeutils.jl:18; isconstType(::Any)
     ╎    ╎ 1     @Base/compiler/typeinfer.jl:33; typeinf(::Core.Compiler.InferenceState)
     ╎    ╎  1     @Base/compiler/optimize.jl:169; optimize(::Core.Compiler.OptimizationState, ::Any)
     ╎    ╎   1     @Base/compiler/ssair/driver.jl:116; run_passes(::Core.CodeInfo, ::Int64, ::Core.Compiler.Optim...
     ╎    ╎    1     @Base/compiler/ssair/inlining.jl:74; ssa_inlining_pass!
     ╎    ╎     1     @Base/compiler/ssair/inlining.jl:549; batch_inline!(::Array{Any,1}, ::Core.Compiler.IRCode, ::A...
     ╎    ╎    ╎ 1     @Base/compiler/ssair/inlining.jl:312; ir_inline_item!(::Core.Compiler.IncrementalCompact, ::In...
    1╎    ╎    ╎  1     @Base/compiler/ssair/inlining.jl:1103; mk_tuplecall!(::Core.Compiler.IncrementalCompact, ::Arr...
     ╎19600 @Base/task.jl:444; task_done_hook(::Task)
     ╎ 19600 @Base/task.jl:709; wait()
19599╎  19600 @Base/task.jl:702; poptaskref(::Base.InvasiveLinkedListSynchronized{Task})
Total snapshots: 27245

Not lets use Gtk

julia> using Gtk
Gtk-Message: 19:05:09.253: Failed to load module "canberra-gtk-module"
Gtk-Message: 19:05:09.253: Failed to load module "canberra-gtk-module"

julia> @profile test_imfilter()
1 imfilter 1.469526 sec
2 imfilter 4.999271 sec
3 imfilter 3.944216 sec
4 imfilter 1.052428 sec
5 imfilter 5.001635 sec

I don't know what to look for in this long trace neith able to understand it

    1╎    ╎    ╎    ╎    ╎    ╎  1     @Base/compiler/typeutils.jl:18; isconstType(::Any)
     ╎    ╎ 1     @Base/compiler/typeinfer.jl:33; typeinf(::Core.Compiler.InferenceState)
     ╎    ╎  1     @Base/compiler/optimize.jl:169; optimize(::Core.Compiler.OptimizationState, ::Any)
     ╎    ╎   1     @Base/compiler/ssair/driver.jl:116; run_passes(::Core.CodeInfo, ::Int64, ::Core.Compiler.Optim...
     ╎    ╎    1     @Base/compiler/ssair/inlining.jl:74; ssa_inlining_pass!
     ╎    ╎     1     @Base/compiler/ssair/inlining.jl:549; batch_inline!(::Array{Any,1}, ::Core.Compiler.IRCode, ::A...
     ╎    ╎    ╎ 1     @Base/compiler/ssair/inlining.jl:312; ir_inline_item!(::Core.Compiler.IncrementalCompact, ::In...
    1╎    ╎    ╎  1     @Base/compiler/ssair/inlining.jl:1103; mk_tuplecall!(::Core.Compiler.IncrementalCompact, ::Arr...
     ╎19600 @Base/task.jl:444; task_done_hook(::Task)
     ╎ 19600 @Base/task.jl:709; wait()
19599╎  19600 @Base/task.jl:702; poptaskref(::Base.InvasiveLinkedListSynchronized{Task})
Total snapshots: 27245
timholy commented 4 years ago

Did you mistakenly post the same profile twice? The one from before using Gtk has 19600 traces on it which corresponds to a run time of around 19.6 seconds. I don't think test_imfilter takes that long to run before you load Gtk.

wizebt commented 4 years ago

Hi Tim apology for this long silence I had to deliver an urgent projet hence pause investigations on this severe imfilter counter performance. Still present on Ubuntu not on Windows, since our last exchange I upgraded to 1.4.1 (from binaries not compiled), Profiling is not bringing much info on root cause. Interrupting benchmarch on the simple test program might help.

I think I found root cause see the last part of this report issue is linked to multithreading

julia> versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) E-2186M  CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6

Test code is running fine and fast as expected

using ImageFiltering, BenchmarkTools

function test_imfilter()
    gk = Kernel.gaussian(2.0f0)
    I = rand(Float32,100,00)

    for i = 1:9
        start = time()
        J = imfilter(I,gk)
        t = round(time() - start, digits=6)
        t > 1 && println("$i imfilter $t sec")
    end
end
julia> include("test_imfilter.jl")
test_imfilter (generic function with 1 method)

julia> @benchmark test_imfilter()
BenchmarkTools.Trial: 
  memory estimate:  698.50 KiB
  allocs estimate:  749
  --------------
  minimum time:     259.490 μs (0.00% GC)
  median time:      271.615 μs (0.00% GC)
  mean time:        287.132 μs (3.19% GC)
  maximum time:     1.935 ms (66.28% GC)
  --------------
  samples:          10000
  evals/sample:     1

Now using

julia> using ImageView
``
Execution is becoming  verrrry slooow... 

julia> @benchmark test_imfilter() 4 imfilter 4.996424 sec 3 imfilter 4.982828 sec 2 imfilter 1.561385 sec ^CERROR: InterruptException: Stacktrace: [1] try_yieldto(::typeof(Base.ensure_rescheduled), ::Base.RefValue{Task}) at ./task.jl:645 [2] wait at ./task.jl:710 [inlined] [3] wait(::Base.GenericCondition{Base.Threads.SpinLock}) at ./condition.jl:106 [4] _wait(::Task) at ./task.jl:238 [5] wait(::Task) at ./task.jl:265 [6] macro expansion at ./threadingconstructs.jl:69 [inlined] [7] _imfilter_tiled_threads!(::ComputationalResources.CPU1{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::NoPad{Pad{2}}, ::Array{Tuple{UnitRange{Int64},UnitRange{Int64}},1}, ::Array{Array{Float32,2},1}) at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:858 [8] _imfilter_tiled!(::ComputationalResources.CPUThreads{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::NoPad{Pad{2}}, ::Array{Array{Float32,2},1}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:854 [9] imfilter!(::ComputationalResources.CPUThreads{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::NoPad{Pad{2}}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:773 [10] imfilter! at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:770 [inlined] [11] imfilter! at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:715 [inlined] [12] imfilter!(::ComputationalResources.CPUThreads{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::Array{Float32,2}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::Pad{0}) at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:702 [13] imfilter!(::Array{Float32,2}, ::Array{Float32,2}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::Pad{0}, ::ImageFiltering.Algorithm.FIRTiled{2}) at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:612 [14] imfilter!(::Array{Float32,2}, ::Array{Float32,2}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::Pad{0}) at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:606 [15] imfilter at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:27 [inlined] [16] imfilter at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:22 [inlined] [17] imfilter at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:18 [inlined] [18] imfilter at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:10 [inlined] [19] imfilter at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:5 [inlined] [20] test_imfilter() at /home/manu/julia/test_imfilter.jl:9 [21] ##core#259() at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:371 [22] ##sample#260(::BenchmarkTools.Parameters) at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:377 [23] _run(::BenchmarkTools.Benchmark{Symbol("##benchmark#258")}, ::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:411 [24] _run(::BenchmarkTools.Benchmark{Symbol("##benchmark#258")}, ::BenchmarkTools.Parameters) at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:399 [25] #invokelatest#1 at ./essentials.jl:712 [inlined] [26] invokelatest at ./essentials.jl:711 [inlined] [27] #run_result#37 at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:32 [inlined] [28] run_result at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:32 [inlined] [29] run(::BenchmarkTools.Benchmark{Symbol("##benchmark#258")}, ::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:94 [30] run at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:94 [inlined] (repeats 2 times) [31] top-level scope at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:289

If you cannot reproduce it issue on your environment, I can offer to connect to mine I have TeamViewer and AnyDesk installed, let me know 
This is a pitty not to be able to use the extremely smart imfilter.

One last test: when `JULIA_NUM_THREADS = 1' there is NO problem.

julia> versioninfo() Julia Version 1.4.1 Commit 381693d3df* (2020-04-14 17:20 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) CPU: Intel(R) Xeon(R) E-2186M CPU @ 2.90GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, skylake) Environment: JULIA_NUM_THREADS = 1

julia> include("test_imfilter.jl") test_imfilter (generic function with 1 method)

julia> @benchmark test_imfilter() BenchmarkTools.Trial: memory estimate: 196.05 KiB allocs estimate: 380

minimum time: 88.571 μs (0.00% GC) median time: 94.051 μs (0.00% GC) mean time: 100.208 μs (4.18% GC) maximum time: 1.314 ms (86.36% GC)

samples: 10000 evals/sample: 1

julia> using ImageView Gtk-Message: 17:36:16.793: Failed to load module "canberra-gtk-module" Gtk-Message: 17:36:16.793: Failed to load module "canberra-gtk-module"

julia> @benchmark test_imfilter() BenchmarkTools.Trial: memory estimate: 196.05 KiB allocs estimate: 380

minimum time: 88.417 μs (0.00% GC) median time: 91.065 μs (0.00% GC) mean time: 98.269 μs (4.74% GC) maximum time: 1.370 ms (90.09% GC)

samples: 10000 evals/sample: 1

julia> @benchmark test_imfilter() BenchmarkTools.Trial: memory estimate: 196.05 KiB allocs estimate: 380

minimum time: 88.255 μs (0.00% GC) median time: 91.777 μs (0.00% GC) mean time: 98.860 μs (4.89% GC) maximum time: 1.370 ms (90.24% GC)

samples: 10000 evals/sample: 1

julia> @benchmark test_imfilter() BenchmarkTools.Trial: memory estimate: 196.05 KiB allocs estimate: 380

minimum time: 88.009 μs (0.00% GC) median time: 90.380 μs (0.00% GC) mean time: 98.582 μs (4.90% GC) maximum time: 1.444 ms (91.11% GC)

samples: 10000 evals/sample: 1

wizebt commented 4 years ago

Hi Tim apology for this long silence I had to deliver an urgent projet hence pause investigations on this severe imfilter counter performance. Still present on Ubuntu not on Windows, since our last exchange I upgraded to 1.4.1 (from binaries not compiled), Profiling is not bringing much info on root cause. Interrupting benchmarch on the simple test program might help.

I think I found root cause see the last part of this report issue is linked to multithreading

julia> versioninfo()
Julia Version 1.4.1
Commit 381693d3df* (2020-04-14 17:20 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) E-2186M  CPU @ 2.90GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-8.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 6

Test code is running fine and fast as expected

using ImageFiltering, BenchmarkTools

function test_imfilter()
    gk = Kernel.gaussian(2.0f0)
    I = rand(Float32,100,00)

    for i = 1:9
        start = time()
        J = imfilter(I,gk)
        t = round(time() - start, digits=6)
        t > 1 && println("$i imfilter $t sec")
    end
end
julia> include("test_imfilter.jl")
test_imfilter (generic function with 1 method)

julia> @benchmark test_imfilter()
BenchmarkTools.Trial: 
  memory estimate:  698.50 KiB
  allocs estimate:  749
  --------------
  minimum time:     259.490 μs (0.00% GC)
  median time:      271.615 μs (0.00% GC)
  mean time:        287.132 μs (3.19% GC)
  maximum time:     1.935 ms (66.28% GC)
  --------------
  samples:          10000
  evals/sample:     1

Now using

julia> using ImageView
``
Execution is becoming  verrrry slooow... 

julia> @benchmark test_imfilter() 4 imfilter 4.996424 sec 3 imfilter 4.982828 sec 2 imfilter 1.561385 sec ^CERROR: InterruptException: Stacktrace: [1] try_yieldto(::typeof(Base.ensure_rescheduled), ::Base.RefValue{Task}) at ./task.jl:645 [2] wait at ./task.jl:710 [inlined] [3] wait(::Base.GenericCondition{Base.Threads.SpinLock}) at ./condition.jl:106 [4] _wait(::Task) at ./task.jl:238 [5] wait(::Task) at ./task.jl:265 [6] macro expansion at ./threadingconstructs.jl:69 [inlined] [7] _imfilter_tiled_threads!(::ComputationalResources.CPU1{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::NoPad{Pad{2}}, ::Array{Tuple{UnitRange{Int64},UnitRange{Int64}},1}, ::Array{Array{Float32,2},1}) at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:858 [8] _imfilter_tiled!(::ComputationalResources.CPUThreads{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::NoPad{Pad{2}}, ::Array{Array{Float32,2},1}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:854 [9] imfilter!(::ComputationalResources.CPUThreads{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::NoPad{Pad{2}}, ::Tuple{Base.OneTo{Int64},Base.OneTo{Int64}}) at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:773 [10] imfilter! at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:770 [inlined] [11] imfilter! at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:715 [inlined] [12] imfilter!(::ComputationalResources.CPUThreads{ImageFiltering.Algorithm.FIRTiled{2}}, ::Array{Float32,2}, ::Array{Float32,2}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::Pad{0}) at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:702 [13] imfilter!(::Array{Float32,2}, ::Array{Float32,2}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::Pad{0}, ::ImageFiltering.Algorithm.FIRTiled{2}) at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:612 [14] imfilter!(::Array{Float32,2}, ::Array{Float32,2}, ::Tuple{OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}},OffsetArrays.OffsetArray{Float32,2,Array{Float32,2}}}, ::Pad{0}) at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:606 [15] imfilter at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:27 [inlined] [16] imfilter at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:22 [inlined] [17] imfilter at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:18 [inlined] [18] imfilter at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:10 [inlined] [19] imfilter at /home/manu/.julia/packages/ImageFiltering/jhBno/src/imfilter.jl:5 [inlined] [20] test_imfilter() at /home/manu/julia/test_imfilter.jl:9 [21] ##core#259() at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:371 [22] ##sample#260(::BenchmarkTools.Parameters) at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:377 [23] _run(::BenchmarkTools.Benchmark{Symbol("##benchmark#258")}, ::BenchmarkTools.Parameters; verbose::Bool, pad::String, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:411 [24] _run(::BenchmarkTools.Benchmark{Symbol("##benchmark#258")}, ::BenchmarkTools.Parameters) at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:399 [25] #invokelatest#1 at ./essentials.jl:712 [inlined] [26] invokelatest at ./essentials.jl:711 [inlined] [27] #run_result#37 at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:32 [inlined] [28] run_result at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:32 [inlined] [29] run(::BenchmarkTools.Benchmark{Symbol("##benchmark#258")}, ::BenchmarkTools.Parameters; progressid::Nothing, nleaves::Float64, ndone::Float64, kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:94 [30] run at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:94 [inlined] (repeats 2 times) [31] top-level scope at /home/manu/.julia/packages/BenchmarkTools/eCEpo/src/execution.jl:289

If you cannot reproduce it issue on your environment, I can offer to connect to mine I have TeamViewer and AnyDesk installed, let me know 
This is a pitty not to be able to use the extremely smart imfilter.

One last test: when `JULIA_NUM_THREADS = 1' there is NO problem.

julia> versioninfo() Julia Version 1.4.1 Commit 381693d3df* (2020-04-14 17:20 UTC) Platform Info: OS: Linux (x86_64-pc-linux-gnu) CPU: Intel(R) Xeon(R) E-2186M CPU @ 2.90GHz WORD_SIZE: 64 LIBM: libopenlibm LLVM: libLLVM-8.0.1 (ORCJIT, skylake) Environment: JULIA_NUM_THREADS = 1

julia> include("test_imfilter.jl") test_imfilter (generic function with 1 method)

julia> @benchmark test_imfilter() BenchmarkTools.Trial: memory estimate: 196.05 KiB allocs estimate: 380

minimum time: 88.571 μs (0.00% GC) median time: 94.051 μs (0.00% GC) mean time: 100.208 μs (4.18% GC) maximum time: 1.314 ms (86.36% GC)

samples: 10000 evals/sample: 1

julia> using ImageView Gtk-Message: 17:36:16.793: Failed to load module "canberra-gtk-module" Gtk-Message: 17:36:16.793: Failed to load module "canberra-gtk-module"

julia> @benchmark test_imfilter() BenchmarkTools.Trial: memory estimate: 196.05 KiB allocs estimate: 380

minimum time: 88.417 μs (0.00% GC) median time: 91.065 μs (0.00% GC) mean time: 98.269 μs (4.74% GC) maximum time: 1.370 ms (90.09% GC)

samples: 10000 evals/sample: 1

julia> @benchmark test_imfilter() BenchmarkTools.Trial: memory estimate: 196.05 KiB allocs estimate: 380

minimum time: 88.255 μs (0.00% GC) median time: 91.777 μs (0.00% GC) mean time: 98.860 μs (4.89% GC) maximum time: 1.370 ms (90.24% GC)

samples: 10000 evals/sample: 1

julia> @benchmark test_imfilter() BenchmarkTools.Trial: memory estimate: 196.05 KiB allocs estimate: 380

minimum time: 88.009 μs (0.00% GC) median time: 90.380 μs (0.00% GC) mean time: 98.582 μs (4.90% GC) maximum time: 1.444 ms (91.11% GC)

samples: 10000 evals/sample: 1

timholy commented 4 years ago

This was the right issue to leave open.

timholy commented 4 years ago

I get good consistent performance if I use fewer threads than I have physical CPU cores (half the number of cores that cat /proc/cpuinfo reports for me), and erratic performance (https://github.com/timholy/ComputationalResources.jl/issues/18) if I exceed that. That's not very surprising, and I don't recommend exceeding the number of physical cores for anything but very specialized applications.

I can also replicate the Gtk-effect with threading. That may be a Julia bug. But I started with filing https://github.com/JuliaGraphics/Gtk.jl/issues/503. I don't think it's an issue for this package, so I'll re-close.

wizebt commented 4 years ago

Thanks Tim for you investigation and reporting cleanup. I always set JULIA_NUM_THREAD=<# phys core>. In order to use imfilter in my code (image demosaic bilinear) I'm running imfilter(CPU1(Algorithm.FIR()), ... It is too slow for my usage (x10 times), I will have to write some specific code as I did for gaussian and median filter. Thank you. PS let me know if you are interested to get the fastest median algo for 3 and 9 elements and how I further reduce #op for separable symetric kernels.

timholy commented 4 years ago

If you are using largeish Gaussian kernels consider IIRGaussian.

I am definitely interested in median filter. Solving the problem in general (arbitrary dimensionality, arbitrary size, arbitrary element type) is a bit tricky, but specific cases are not so bad. 2-dimensions, 3x3 is obviously a common use case. We could also probably do something fairly generic for 8-bit grayscale (N0f8). But for n-dimensional Float32 fast median filtering with arbitrary kernel sizes, I'm still not sure the right algorithm for that.