allenai / deep_qa

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)
Apache License 2.0
404 stars 133 forks source link

Data generator prep take two #293

Closed matt-gardner closed 7 years ago

matt-gardner commented 7 years ago

This PR separates updating model state from creating data arrays. The main changes here are in Trainer and TextTrainer, and the rest of the changes are just consequences of that API change.

The point of this is that it gives us a simpler interface around getting data arrays, which is easier to then swap out with a data generator. In this design, we're still assuming that you can fit your whole dataset into memory, both in plain text and as indexed instances, but we don't do padding over the whole dataset (if you use a generator, which isn't implemented in this PR; that'll be the next one).