Data generators and dynamic padding

matt-gardner commented 7 years ago

This PR allows for what I'll call dynamic padding. This means that padding only happens at a batch level. If you sort instances by size and then sample slices from this list, you should be able to dramatically decrease runtime, because the amount of padding you actually need is minimized.

So far I've implemented everything but the sorting, for the simplest model case. Individual models will have to do a little bit of work, probably, to support this for particular padding dimensions, but the changes should be minimal in most cases.

matt-gardner commented 7 years ago

Note that this handles more than just what dynamic_rnn does. In this particular setting, we're not doing RNNs at all - we're distributing a character-level CNN encoder across an unknown number of words. This isn't something you would do with dynamic_rnn.

I agree that we need to be careful with this, though, which is why there's so much documentation about how you should probably use TimeDistributed if it works for you. This limits the kinds of reshaping you can easily do in your models. I should probably use the EncoderWrapper by default in the word and character tokenizer if use_dynamic_padding is False, actually, otherwise you just won't be able to build certain kinds of models that use character-level encoders.

DeNeutoy commented 7 years ago

Yeah sure, you're right. Feel free to merge.

allenai / deep_qa

Data generators and dynamic padding #295