FluxML / NNlib.jl

Neural Network primitives with multiple backends
Other
204 stars 121 forks source link

Docstrings & tests for the new convolution interface #34

Open dfdx opened 6 years ago

dfdx commented 6 years ago

I'm trying to understand the meaning and usage of this signature in the new convolution interface:

conv!(y::AbstractArray{T,3}, x::AbstractArray{T,3}, w::AbstractArray{T,3}

w here should have at least 4 parameters:

  1. filter height
  2. filter width
  3. number of input channels
  4. number of output channels

So having w of 3D doesn't make sense to me.

Also, for x and y we seem to add an additional dimension into position 2. I guess the intent was to add a dummy batch or channel dimension, which would be true for C API (i.e. in NCHW scheme). However, in Julia API we normally use spatial dimensions first, i.e. HWCN scheme (or WHCN, I always forget which one). So adding a new dimension at position 2 means setting image width to 1.

Am I misunderstanding something?

All in all, I have doubts regarding automatic addition of missing dimensions and having a single conv! dispatching on array dimensions: with conv2d + 4D tensors and conv3d + 5D tensors it's absolutely clear what each dimension means. However, 3D tensor in conv! may be interpreted as missing channel dimension (multiple grayscale images) or missing batch dimension (single colored image). Similarly, conv! on 4D tensor may be interpreted as conv2d! or conv3d with one missing dimension, and so on.

MikeInnes commented 6 years ago

It's just convolution on a 1D "image"; convolution is well defined for any number of dimensions, and in all cases we require a batch and channel dimension to be provided, so the signature for ND conv always involves (N+2)D weights and data.

Since we currently only have 2 and 3d convolution kernels, we treat 1d convolution as 2d with image/filter width 1, which is equivalent. We could equally well get away with only a 3D kernel and see 1D as (H,1,1), 2D as (H,W,1) etc.

dfdx commented 6 years ago

Ah, I've got asked why convolutions don't work for a single RGB image so many times that I didn't even consider 1D convolutions, thanks for clarification. Yet, we need a good docstring and probably some tests for this. I'll rename the issue to keep track of it,