Closed shkarupa-alex closed 2 years ago
We choose d according to the amount of calculation and parameters.
You can write formulas for the quantities of parameters and calculations, and then take the derivative with respect to d for a fixed K. In this way, you can choose a d such that the amount of computation and parameters is minimal.
In your paper you describe two cases: kernel = 21, dw_kernel = 5, dwd_kernel = 7, dwd_dilation=3 kernel = 13, dw_kernel = 5, dwd_kernel = 5, dwd_dilation=3
and propose equations: dw_kernel = 2 * dwd_dilation - 1 dwd_kernel = kernel / dwd_dilation
Could you please clarify how to choose these parameters for other kernel sizes? E.g. what if we have kernel size = 22. Should we choose even dw_kernel, dwd_kernel and dilation?
It seems that from all possible 2 <= dwd_dilation <= kernel // 2 we should choose such value that minimizes (dw_kernel^2 + dwd_kernel^2). I'm wondering if there is an analytical solution?