Avoid repeating the kernel

f-dangel / unfoldNd

(N=1,2,3)-dimensional unfold (im2col) and fold (col2im) in PyTorch

MIT License

82 stars 6 forks source link

I benchmarked three approaches:

1 group: This turned out to be the slowest, and (surprisingly) also used most memory, even though the one-hot kernel is not repeated at all
C_in groups: This is the approach used in 0.0.1. It shows similar performance to the next approach, but uses a one-hot kernel which only uses 1 / N times the memory.
C_in * N groups: Shows similar performance to the approach with C_in groups, but uses a much larger one-hot kernel.

For now, I will stay with the C_in groups strategy. In the future, there could be a switch allowing choosing one of the three approaches, but I have not encountered a practical use case yet.

f-dangel / unfoldNd

Avoid repeating the kernel #11