apache / mxnet

Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more
https://mxnet.apache.org
Apache License 2.0
20.78k stars 6.79k forks source link

The case for splitting model class #417

Closed piiswrong closed 8 years ago

piiswrong commented 8 years ago

The current model class is specific to feedforward net, but it contains some common functions that all models can use, like save&load. Plus if we want learning rate multipliers for each parameter, it's better if we can do it once in base class rather than in every custom training loop. I propose the following changes:

  1. Split model class into Model, FeedforwardModel (inherits Model), and Trainer (or TrainingPlan? Solver? Executor?) with following functions: Model: handles saving, loading, etc. Should hold parameters. FeedForwardModel: Subclass of model, handles feedforward specific parts Trainer: Implements training loop. Calls optimizer. should have Trainer, ParallelTrainer, Distributed Trainner. learning rate multipliers should be handled by this
  2. Merge the training loop part of LSTM example into LSTMModel
  3. We can also have autoencoder model and RBM model if anyone is still using it.
futurely commented 8 years ago

The corresponding concepts in Caffe are net, solver and solvers. Caffe has merged the parallel features in https://github.com/BVLC/caffe/pull/2870 and https://github.com/BVLC/caffe/pull/2903. Distributed training was published by @cque7 from Intel yesterday.

antinucleon commented 8 years ago

I think explicit solver is unnecessary. In my view model is for ppl don't have any customize requirement. Symbolic interface is better for customized need. Also we are not making an extra caffe, if there is performance consideration we welcome to have benchmark,

tqchen commented 8 years ago

I guess @piiswrong is talking about potential re-use of some components in model for other types of purposes. This requirement is reasonable. I can see two possible ways to do this:

antinucleon commented 8 years ago

@piiswrong I think the headache part for RNN model is hidden states. I will make an attention model tomorrow or the day after tomorrow, then I believe we will have a better sense of how to reuse current code.

piiswrong commented 8 years ago

@antinucleon we can debate about whether solvers are necessary, but I think FeedForwardModel/Model separation is a good idea. Without a base model class, you need to write save/load for every custom model to say the least. There are also stuff like listing/allocating parameter that can be reused.

This is also going to be bad for creating a pretrained model repo in the future, since you have to sed weights along with code to load them.

pluskid commented 8 years ago

Some code from FeedForward should definitely be refactored and put into super class once we have another model type. I vote for not having a separate trainer. Because different models will have different training logics. For example, RNN models will need to maintain states transition, gradient cut-off over time.

tqchen commented 8 years ago

closing due to inactive status