Closed matt-gardner closed 7 years ago
Note that this handles more than just what dynamic_rnn
does. In this particular setting, we're not doing RNNs at all - we're distributing a character-level CNN encoder across an unknown number of words. This isn't something you would do with dynamic_rnn
.
I agree that we need to be careful with this, though, which is why there's so much documentation about how you should probably use TimeDistributed
if it works for you. This limits the kinds of reshaping you can easily do in your models. I should probably use the EncoderWrapper
by default in the word and character tokenizer if use_dynamic_padding
is False
, actually, otherwise you just won't be able to build certain kinds of models that use character-level encoders.
Yeah sure, you're right. Feel free to merge.
This PR allows for what I'll call dynamic padding. This means that padding only happens at a batch level. If you sort instances by size and then sample slices from this list, you should be able to dramatically decrease runtime, because the amount of padding you actually need is minimized.
So far I've implemented everything but the sorting, for the simplest model case. Individual models will have to do a little bit of work, probably, to support this for particular padding dimensions, but the changes should be minimal in most cases.