Closed neubig closed 7 years ago
Hi @neubig , does L, M represent the length of each word vector and the length of input sentence respectively? According to Kim 2014: http://aclweb.org/anthology/D/D14/D14-1181.pdf, I think Y[i][j] may be equal to the dot product of filter F[i] and matrix slice X[:,j:j+O] instead of X[:j,j+O].
@MalcolmSun Thanks for the catch. I just had gotten my slice notation wrong. I've fixed the original post.
Hi @neubig , I am attempting to modify the 1D convolution operation. To the best of my knowledge, 1D convolution is between two 1D sequences, the convolution in Kim 2014 seems like a 2D convolution operation in computer vision. So I decide to implement 2D convolution function wich supports padding and stride. What do you think about it?
Hi @MalcolmSun , yes we've had some discussions about this offline and basically the current understanding is this: 1) Kim 2014 is basically a 2D convolution where the height of the filter is equal to the word embedding. So to answer your question, yes it would be sufficient for you to implement the more general class of 2D convolutions, and it would be greatly appreciated. 2) The 1D convolution in DyNet is implementing something else entirely. I'm not aware of any research paper that uses a function like this, and don't have any immediate ideas about how we could use it, although it might be useful for something. I think perhaps we can rename it something else more intuitive, or remove it entirely?
@MalcolmSun @neubig Just to give my understanding :
I think you should not consider the embedding dimension as the "second dimension" of the convolution.
To me it is more like the number of channels that you have in an image (R, G, B, alpha,...), or like a "depth" and you only "shift" the filter along 1 dimension : the time dimension
So
1D dimension = filters have the shape [temporal span]x[number of channels] and you only shift the filters along the time dimension (1D)
2D dimension = filters have the shape [width]x[height]x[number of channels] and you shift the filter along the width and height dimension of the 2D input (image,...)
etc...
@pmichel31415 Thinking about channels is a good idea. Just putting this out there, but can we make convolutions general where the filter has the same number of dimensions of the input, and convolutions are shifted along all dimensions where the size is less than the input? In the case of channels, the filter width along the "channel" dimension is equal to the number of channels, so we don't do any shifting. It would be great if we could think about things this way and not have to worry about naming our dimensions at all.
I like the idea of having it be essentially an n-dimensional convolution, but it still might be a good idea to use it to make conventional conv2d and conv1d operators available for ease of use. I think we'd have to do this manually, though, along the same way you implemented conv1d but just covering all of the dimensions, as I don't think eigen supports anything higher than conv2d+channels.
I have the Kim algorithm implemented in Keras, PyTorch, Torch7 and TensorFlow. I'm interested in implementing it in DyNet as well, once this issue is resolved. If helpful for reference of what other FWs do, code is here:
1D convolutions have been deprecated, because things should be able to be handled by conv2d: https://github.com/clab/dynet/pull/388
(Edited for clarity) We were taking a look at 1D convolutions now, and it looks like they aren't doing what we would expect according to Kim 2014: http://aclweb.org/anthology/D/D14/D14-1181.pdf
For example, for conv1d, we would expect:
On the other hand, what we have now is:
In other words, filters are run row-wise over the input instead of being calculated with respect to the whole input.