Open dfdx opened 6 years ago
It's just convolution on a 1D "image"; convolution is well defined for any number of dimensions, and in all cases we require a batch and channel dimension to be provided, so the signature for ND conv always involves (N+2)D weights and data.
Since we currently only have 2 and 3d convolution kernels, we treat 1d convolution as 2d with image/filter width 1, which is equivalent. We could equally well get away with only a 3D kernel and see 1D as (H,1,1), 2D as (H,W,1) etc.
Ah, I've got asked why convolutions don't work for a single RGB image so many times that I didn't even consider 1D convolutions, thanks for clarification. Yet, we need a good docstring and probably some tests for this. I'll rename the issue to keep track of it,
I'm trying to understand the meaning and usage of this signature in the new convolution interface:
w
here should have at least 4 parameters:So having
w
of 3D doesn't make sense to me.Also, for
x
andy
we seem to add an additional dimension into position 2. I guess the intent was to add a dummy batch or channel dimension, which would be true for C API (i.e. in NCHW scheme). However, in Julia API we normally use spatial dimensions first, i.e.HWCN
scheme (orWHCN
, I always forget which one). So adding a new dimension at position 2 means setting image width to 1.Am I misunderstanding something?
All in all, I have doubts regarding automatic addition of missing dimensions and having a single
conv!
dispatching on array dimensions: withconv2d
+ 4D tensors andconv3d
+ 5D tensors it's absolutely clear what each dimension means. However, 3D tensor inconv!
may be interpreted as missing channel dimension (multiple grayscale images) or missing batch dimension (single colored image). Similarly,conv!
on 4D tensor may be interpreted asconv2d!
orconv3d
with one missing dimension, and so on.