Closed f-dangel closed 3 years ago
I benchmarked three approaches:
1
group: This turned out to be the slowest, and (surprisingly) also used most memory, even though the one-hot kernel is not repeated at allC_in
groups: This is the approach used in 0.0.1
. It shows similar performance to the next approach, but uses a one-hot kernel which only uses 1 / N
times the memory.C_in * N
groups: Shows similar performance to the approach with C_in
groups, but uses a much larger one-hot kernel.For now, I will stay with the C_in
groups strategy. In the future, there could be a switch allowing choosing one of the three approaches, but I have not encountered a practical use case yet.
Currently the one-hot kernel is an identical copy for each input channel. This is not necessary when treating the channel in the input like a batch dimension. Besides smaller memory required to store the kernel, this optimization could also lead to run time speedups as the convolution operation will use
groups=1
instead ofgroups=in_channels
; I read somewhere that grouped convolutions are slower than ungrouped ones (I don't know the reason and never checked it).