Closed piiswrong closed 8 years ago
The corresponding concepts in Caffe are net, solver and solvers. Caffe has merged the parallel features in https://github.com/BVLC/caffe/pull/2870 and https://github.com/BVLC/caffe/pull/2903. Distributed training was published by @cque7 from Intel yesterday.
I think explicit solver is unnecessary. In my view model is for ppl don't have any customize requirement. Symbolic interface is better for customized need. Also we are not making an extra caffe, if there is performance consideration we welcome to have benchmark,
I guess @piiswrong is talking about potential re-use of some components in model for other types of purposes. This requirement is reasonable. I can see two possible ways to do this:
@piiswrong I think the headache part for RNN model is hidden states. I will make an attention model tomorrow or the day after tomorrow, then I believe we will have a better sense of how to reuse current code.
@antinucleon we can debate about whether solvers are necessary, but I think FeedForwardModel/Model separation is a good idea. Without a base model class, you need to write save/load for every custom model to say the least. There are also stuff like listing/allocating parameter that can be reused.
This is also going to be bad for creating a pretrained model repo in the future, since you have to sed weights along with code to load them.
Some code from FeedForward
should definitely be refactored and put into super class once we have another model type. I vote for not having a separate trainer. Because different models will have different training logics. For example, RNN models will need to maintain states transition, gradient cut-off over time.
closing due to inactive status
The current model class is specific to feedforward net, but it contains some common functions that all models can use, like save&load. Plus if we want learning rate multipliers for each parameter, it's better if we can do it once in base class rather than in every custom training loop. I propose the following changes: