DeadAt0m / ActiveSparseShifts-PyTorch

Implementation of Sparse Shift Layer and Active Shift Layer (3D, 4D, 5D tensors) for PyTorch(CPU,GPU)
34 stars 4 forks source link

Question about plan for extend stride option? #3

Open Eunhui-Kim opened 4 years ago

Eunhui-Kim commented 4 years ago

Thank you for sharing open code.

I verify your code is working well for resnet at cifar10 dataset.

For extending experiments with imagenet not only basic resnet but also bottleneck option,

then I think the stride option is essential.

Don't you have a plan with stride option?

DeadAt0m commented 4 years ago

Hello, I am, honestly, do not understand how the stride can be applied here. You can found the mathematical definition of shift operation here for reference. So in case of 4d tensor T of size [B, C, H, W], we create empty_like one: shifted_T and then make a memory copyshifted_T[:,:,i,j] = T[:,:, i+a, j+b], a, b -is trainable integer shifts(step and direction), empty space in shifted_T are filling with padding( in simple case with zeros) As you see, stride is not applicable here. All as we can do, add constant shift s to both dimensions, like this shifted_T[:,:,i,j] = T[:,:, i+s+a, j+s+b] but this is a not stride exactly.

If you have any ideas or corrections, please, you are welcome :).

P.S. Of course, in case active shift - which in turn a just bilinear interpolation, the stride is applicable, but I am not interesting in using of active shift - because it computationally ineffective (in comparison even with depthwise convolution) and complex. So, in this code I just include active shift on forward pass just to support the mentioned article, because during backward pass we need do the same computations(interpolation).

Eunhui-Kim commented 4 years ago

Thank you for your open mind. I think active shift is complex however the accuracy is better than sparse shift. And I don't think sparse shift is efficient since the computation is also increasing. However both shift operations allow small memory foot print. :-)

I'll seek and share if I solve the corrections for stride restriction. :-)

DeadAt0m commented 4 years ago

And I don't think sparse shift is efficient since the computation is also increasing. Sparse shift on forward pass(inference) is just memory copy(it's ZERO Flops) and should work very fast in theory(faster than depthwise conv or even bilinear interpolation) . However, I found current implementation of shifts is not good optimized at all and I working on it :)