Request for documentation for combining pretraining and training in a single MultiLayerConfiguration

daanvdn commented 7 years ago

I have a request for documentation relating to the combined use of unsupervised pretraining and supervised training.
The documentation online, the dl4j book and the dl4j-examples repo all give good information on how to configure a standalone deep neural net that does unsupervised training, for instance using stacked denoising autoencoders. All of this documentation also explains what neural nets of this type can be used for, e.g. for learning better feature weights, which can then be used as inputs when training a classifier in a supervised fashion.
What I am missing in terms of documentation though are guidelines and code samples that show how to set up a single MultiLayerConfiguration that combines both pretraining and training layers.
Looking at the implementation of org.deeplearning4j.nn.multilayer.MultiLayerNetwork#fit(org.nd4j.linalg.dataset.api.iterator.DataSetIterator) I've tried to figure out how such a configuration might look like. This is what I've come up with:

        MultiLayerConfiguration config = new NeuralNetConfiguration.Builder().seed(1230L)
                .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
                .gradientNormalizationThreshold(1.0)
                .iterations(1)
                .momentum(0.5)
                .momentumAfter(Collections.singletonMap(3, 0.9))
                .optimizationAlgo(OptimizationAlgorithm.CONJUGATE_GRADIENT)
                .rmsDecay(0.95)
                .regularization(true)
                .updater(Updater.RMSPROP)
                .l2(0.001)
                .list()
                .layer(0, new AutoEncoder.Builder().nIn(7000).nOut(3000).weightInit(WeightInit.XAVIER).lossFunction(
                        LossFunctions.LossFunction.RMSE_XENT).corruptionLevel(0.3).build())
                .layer(1, new AutoEncoder.Builder().nIn(3000).nOut(1500).weightInit(WeightInit.XAVIER).lossFunction(
                        LossFunctions.LossFunction.RMSE_XENT).corruptionLevel(0.3).build())
                .layer(2, new AutoEncoder.Builder().nIn(1500).nOut(800).weightInit(WeightInit.XAVIER).lossFunction(
                        LossFunctions.LossFunction.RMSE_XENT).corruptionLevel(0.3).build())
                .layer(3,
                        new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD).activation("softmax")
                                .nIn(400)
                                .nOut(300)
                                .build())
                .layer(4, new DenseLayer.Builder().nIn(300).nOut(200).activation("tanh").build())
                .layer(5, new GravesLSTM.Builder().nIn(200).nOut(100).activation("tanh").build())
                .layer(6,
                        new RnnOutputLayer.Builder(LossFunctions.LossFunction.XENT).activation("sigmoid")
                                .nIn(100)
                                .nOut(40)
                                .build())
                .backpropType(BackpropType.Standard)
                .pretrain(true)
                .backprop(true)
                .setInputType(InputType.recurrent(7000))
                .build();

This config would pass the input (which is time-series data) to a stacked denoising autoencoder, reducing the 7000 features to 300 and then use this output as input for a mixed feedforward + lstm network to train a multi-label classifier.

Questions:

In general, is this the correct way to combine unsupervised pretraining and supervised training?
In order to do pretraining on time-series data, does it suffice to simply set the InputType to recurrent?
What happens when I try to perform multiple epochs of training with this config? Can this be done in the same fashion as you would train a neural net without pretraining? I am asking because I would think that the code below would cause both pretraining and training to be repeated every epoch. If so, does that make sense? I would think that it makes more sense to first run pretraining 10 epochs and then use the result of that as input for 10 epochs of supervised training. But as far I can see the code provided here does not achieve that, right?

                DataSetIterator trainingData = ...; 
                MultiLayerNetwork multiLayerNetwork = new MultiLayerNetwork(config); 
                for (int epoch = 0; epoch < 10; epoch++) { 
                    multiLayerNetwork.fit(trainingData); 
                       trainingData.reset(); 
                }

This leads me to wonder whether the solution would be to train two separate MultiLayerNetworks (one for pretraining and another for the actual training, each 10 epochs)? This would not be ideal obviously..

Still related to the epoch question: Can the EarlyStoppingTrainer be used with a config that combines pretraining and training?

A final question relates to whether a trained neural net like the one presented above allows for model interpretability in the way proposed here (https://github.com/deeplearning4j/deeplearning4j/issues/3568) by @AlexDBlack? Will such a model allow a call to MultiLayerNetwork#computeGradientAndScore and MultiLayerNetwork#epsilon to capture the contribution of inputs to the predicted class(es)?

Thanks in advance!!!

tomthetrainer commented 7 years ago

@daanvdn Thanks for the issue, I agree we could use some documentation on this issue.

The more feedback and input we get from you the better. Have you found any of our documentation pages to be helpful or lacking? If so which pages?

daanvdn commented 7 years ago

hi @tomthetrainer, a while ago @AlexDBlack pointed out to me that my configuration above can't be right because I have an output layer in the middle of the network. Are there any snippets in the examples repo to show how it should be done? thanks

tomthetrainer commented 7 years ago

Hi @daanvdn I could not find any example code. Perhaps @eraly or @turambar have some insight.

deeplearning4j / deeplearning4j

Request for documentation for combining pretraining and training in a single MultiLayerConfiguration #3667