Open camelop opened 5 years ago
@kernelmachine , could you take a look at this? (As it looks like you're the author of the BagOfWordCountsTokenEmbedder
.) @camelop's point seems like a good one, but I'm not sure what implications there are for existing code. Thanks!
ping @kernelmachine
Yes, I think unsqueezing the num_tokens
dimension is a reasonable solution (it is a bit confusing, but we could make it clear in the documentation). The bag of words embedder is pretty isolated from the rest of the repo, so I don't think it'll be an issue. Contributions would be welcome!
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
@kernelmachine this is just a friendly ping to make sure you haven't forgotten about this issue 😜
Bug Description BagOfWordCountsTokenEmbedder returns tensor with shape [batchsize, size of vocab (or projection dim)], while most of other embedders return [batchsize, num of tokens, size of hidden] This is at some level inconsistent with the definition of embedder, so seq2vec in BasicClassifier cannot be applied (the input is not even a sequence).
To Reproduce Use them together like
Expected behavior At least make the output of BagOfWordCountsTokenEmbedder looks like [bs, 1, vocab size] would help. (Although it will still be a bit confusing.)
System (please complete the following information):