f-dangel / unfoldNd

(N=1,2,3)-dimensional unfold (im2col) and fold (col2im) in PyTorch
MIT License
82 stars 6 forks source link

Avoid repeating the kernel #11

Closed f-dangel closed 3 years ago

f-dangel commented 3 years ago

Currently the one-hot kernel is an identical copy for each input channel. This is not necessary when treating the channel in the input like a batch dimension. Besides smaller memory required to store the kernel, this optimization could also lead to run time speedups as the convolution operation will use groups=1 instead of groups=in_channels; I read somewhere that grouped convolutions are slower than ungrouped ones (I don't know the reason and never checked it).

f-dangel commented 3 years ago

I benchmarked three approaches:

For now, I will stay with the C_in groups strategy. In the future, there could be a switch allowing choosing one of the three approaches, but I have not encountered a practical use case yet.