RTIInternational / gobbli

Deep learning with text doesn't have to be scary.
Apache License 2.0
275 stars 23 forks source link

Support reading input directly from files #3

Open jasonnance opened 5 years ago

jasonnance commented 5 years ago

Feature

Tasks that take input can accept an in-memory list or a path to a file with some specified format with optional validation.

Motivation

Prevents users' datasets from having to fit in memory on gobbli's side (although we can't easily control how some models are reading it in, so this may not make a difference for those models). May save time in some cases, since we have to write data to disk anyway to make it available to model Docker containers, so if it was there to begin with, the initial read/write is somewhat unnecessary.

Additional Details

Ideally we'd modify the model wrappers where we control data input (ex. Transformer, MT-DNN) to optionally lazily load data from files to completely eliminate the possibility of exhausting RAM.