Open JoshuaBillson opened 1 week ago
I see I'm blamed in https://github.com/FluxML/Flux.jl/pull/1921 for suggesting that change, although I've forgotten why.
With the code above, I see similar numbers to you, grouped_conv
is faster but has many small allocations:
julia> depthwise_conv.weight .= grouped_conv.weight;
julia> y1 = @btime grouped_conv(x);
4.430 ms (23577 allocations: 201.13 MiB)
julia> y2 = @btime depthwise_conv(x);
15.584 ms (27 allocations: 199.06 MiB)
julia> y1 ≈ y2
true
Repeating the benchmarks of #1921 today... Flux.DepthwiseConv with groups has many small allocations, more than seen in #1921, although even then it was an increase over before:
julia> x = randn(Float32, 128, 128, 32, 32);
julia> dconv1 = Flux.DepthwiseConv((3,3), 32 => 64) # using groups, after 1921
Conv((3, 3), 32 => 64, groups=32) # 640 parameters
julia> z1 = @btime $dconv1($x);
38.161 ms (1236 allocations: 370.29 MiB)
julia> dconv2 = DepthwiseConv((3,3), 32 => 64); # using code above
julia> copyto!(dconv2.weight, dconv1.weight); # 3×3×2×32 from 3×3×1×64
julia> z2 = @btime $dconv2($x);
45.090 ms (42 allocations: 370.16 MiB)
julia> z1 ≈ z2
true
julia> Threads.nthreads()
4
I think the NNlib CPU conv code remains in need of some care... more and more layers of multi-threading were added & probably ought to be pruned.
Depthwise convolutions, which are currently implemented as a standard
Conv
layer with the number of groups equal to the number of input channels, seem to produce a very large number of allocations compared to the oldDepthwiseConv
layer. To confirm this, I restored theDepthwiseConv
layer removed in #1921 and compared the performance to the current implementation.Running the code below shows the following:
Conv
withgroups=1024
produces around 750 times as many allocations asDepthwiseConv
. The result is that CNN architectures which rely on depthwise convolutions produce hundreds of thousands of allocations compared to only a few thousand for comparably sized models with standard convolutions. Is there any reason for this discrepancy?Note: I am testing this on Julia 1.10 with Flux v0.14.22.