Clean multi-gpu training

DeNeutoy commented 7 years ago

FYI @matt-gardner - this gpu stuff works(as in 2 gpus is nearly twice as fast over the epoch) but there is some small indexing bug which i'll find tomorrow with 3. But nearly there!

DeNeutoy commented 7 years ago

Actually there wasn't a bug, I was just being dumb and running one of the experiments with dynamic padding by accident.

1 GPU: 1126 sec/epoch 2 GPU: 592 sec/epoch 3 GPU 415 sec/epoch

All with nearly full GPU utilisation (~90%). I think it should be almost trivial to get it working with dynamic batching as well, so i'll give that a crack tomorrow.

matt-peters commented 7 years ago

One high level design question/observation: You have split the multiple GPU stuff across several different classes with fairly different main objectives. The DeepQAModel class has low level details about how to generate batches in the multiple GPU regime in _fit_loop. However, the Trainer class also has low level details related to the specific multiple GPU implementation in _compile_parallel_model. This tightly bounds together the implementations and will make them much harder to separate or maintain. With the goal of building modular, reusable components I'd suggest you encapsulate the details of the multiple GPU implementation in one place, that could be used independent of the details of deep QA's Trainer class. One use case for this is where someone wants to use just use some pieces of deep_qa, but not all of it: e.g., the pieces that interact with Keras/tensorflow like the layers, models and multiple GPU implementation.

matt-gardner commented 7 years ago

The DeepQaModel class exists and needs to be modified because we're using Keras' training code, and this is our subclass of Keras' Model, so that we can modify Keras' loop. I'm totally fine with ditching Keras' training code entirely, if we can do it better because we're just focused on tensorflow. That would essentially make DeepQaModel go away, instead just returning input and output tensors in _build_model.

I would push back a bit on the "make this library re-usable in pieces" part, though. Yes, we should definitely make the library as modular as possible, as that's just good software design. But I don't think it should be our explicit goal to let someone just use parts of this library. That's not really the point, and trying to support that kind of use could lead us to poor design choices, just like Keras makes suboptimal decisions if all you want to do is use tensorflow. If it happens as a byproduct, like for the layers, and some of the data processing code, great. But I don't think it should be a top-level design goal. Instead, we should make the library good enough to be used directly, instead of just pulling out parts of it.

matt-peters commented 7 years ago

We should take this offline for discussion and out of this PR -- but in brief I disagree. We should have an explicit goal to make subcomponents useful on their own as I think this will naturally lead to more uptake in usage and a much more flexible / extendable framework. It also makes the code easier to maintain. Most researchers already have major components of their workflow and infrastructure established and will be more likely to try out deep_qa in bits and pieces if they can easily integrate them into their existing work, vs replacing and starting from scratch. Of course we can still provide the full functionality for those who want it.

allenai / deep_qa

Clean multi-gpu training #372