allenai / deep_qa

A deep NLP library, based on Keras / tf, focused on question answering (but useful for other NLP too)
Apache License 2.0
404 stars 133 forks source link

Maybe move the utility methods out of `TextTrainer`? #316

Open matt-gardner opened 7 years ago

matt-gardner commented 7 years ago

You could make the argument that the way data is handled and the way we build models are too tightly coupled, and should be decomposed. That would mean, basically, making a cleaner separation between the objects that read and process data and TextTrainer, and perhaps also splitting out the _embed_input, _get_encoder, and _get_seq2seq_encoder methods into a separate model utility class.

I'm not totally sold that this is necessary, though. In order to make the handling of word / word+character tokenizers transparent to the model class, you have to have a tight coupling between the data generator and the _embed_input method. I think it would be pretty difficult to make this work without the way that it's currently structured.