1D Convolutions not Expected Functionality (Kim 2014)

neubig commented 7 years ago

(Edited for clarity) We were taking a look at 1D convolutions now, and it looks like they aren't doing what we would expect according to Kim 2014: http://aclweb.org/anthology/D/D14/D14-1181.pdf

For example, for conv1d, we would expect:

Input X is a {L,M} matrix.
There are N filters F, each of size {L,O}.
The computation result Y is an {N,M-O+1} matrix. Y_{i,j} is equal to the dot product of filter F[i] and matrix slice X[:,j:j+O].

On the other hand, what we have now is:

Input X is a {L,M} matrix.
There are L filters F, each of size {1,O}.
The computation result Y is an {L, M-O+1} matrix, where Y_{i,j} is equal to the dot product of filter F[i] and matrix slice F[i,j:j+O].

In other words, filters are run row-wise over the input instead of being calculated with respect to the whole input.

sunalbert commented 7 years ago

Hi @neubig , does L, M represent the length of each word vector and the length of input sentence respectively? According to Kim 2014: http://aclweb.org/anthology/D/D14/D14-1181.pdf, I think Y[i][j] may be equal to the dot product of filter F[i] and matrix slice X[:,j:j+O] instead of X[:j,j+O].

neubig commented 7 years ago

@MalcolmSun Thanks for the catch. I just had gotten my slice notation wrong. I've fixed the original post.

sunalbert commented 7 years ago

Hi @neubig , I am attempting to modify the 1D convolution operation. To the best of my knowledge, 1D convolution is between two 1D sequences, the convolution in Kim 2014 seems like a 2D convolution operation in computer vision. So I decide to implement 2D convolution function wich supports padding and stride. What do you think about it?

neubig commented 7 years ago

Hi @MalcolmSun , yes we've had some discussions about this offline and basically the current understanding is this: 1) Kim 2014 is basically a 2D convolution where the height of the filter is equal to the word embedding. So to answer your question, yes it would be sufficient for you to implement the more general class of 2D convolutions, and it would be greatly appreciated. 2) The 1D convolution in DyNet is implementing something else entirely. I'm not aware of any research paper that uses a function like this, and don't have any immediate ideas about how we could use it, although it might be useful for something. I think perhaps we can rename it something else more intuitive, or remove it entirely?

pmichel31415 commented 7 years ago

@MalcolmSun @neubig Just to give my understanding :

I think you should not consider the embedding dimension as the "second dimension" of the convolution.

To me it is more like the number of channels that you have in an image (R, G, B, alpha,...), or like a "depth" and you only "shift" the filter along 1 dimension : the time dimension

So

1D dimension = filters have the shape [temporal span]x[number of channels] and you only shift the filters along the time dimension (1D)
2D dimension = filters have the shape [width]x[height]x[number of channels] and you shift the filter along the width and height dimension of the 2D input (image,...)

etc...

neubig commented 7 years ago

@pmichel31415 Thinking about channels is a good idea. Just putting this out there, but can we make convolutions general where the filter has the same number of dimensions of the input, and convolutions are shifted along all dimensions where the size is less than the input? In the case of channels, the filter width along the "channel" dimension is equal to the number of channels, so we don't do any shifting. It would be great if we could think about things this way and not have to worry about naming our dimensions at all.

dbc148 commented 7 years ago

I like the idea of having it be essentially an n-dimensional convolution, but it still might be a good idea to use it to make conventional conv2d and conv1d operators available for ease of use. I think we'd have to do this manually, though, along the same way you implemented conv1d but just covering all of the dimensions, as I don't think eigen supports anything higher than conv2d+channels.

dpressel commented 7 years ago

I have the Kim algorithm implemented in Keras, PyTorch, Torch7 and TensorFlow. I'm interested in implementing it in DyNet as well, once this issue is resolved. If helpful for reference of what other FWs do, code is here:

https://github.com/dpressel/baseline/tree/master/classify

neubig commented 7 years ago

1D convolutions have been deprecated, because things should be able to be handled by conv2d: https://github.com/clab/dynet/pull/388

clab / dynet

1D Convolutions not Expected Functionality (Kim 2014) #236