deeplearning4j / deeplearning4j

Suite of tools for deploying and training deep learning models using the JVM. Highlights include model import for keras, tensorflow, and onnx/pytorch, a modular and tiny c++ library for running math code and a java based math library on top of the core c++ library. Also includes samediff: a pytorch/tensorflow like library for running deep learn...
http://deeplearning4j.konduit.ai
Apache License 2.0
13.7k stars 3.84k forks source link

Request for documentation for combining pretraining and training in a single MultiLayerConfiguration #3667

Closed daanvdn closed 3 years ago

daanvdn commented 7 years ago

I have a request for documentation relating to the combined use of unsupervised pretraining and supervised training.
The documentation online, the dl4j book and the dl4j-examples repo all give good information on how to configure a standalone deep neural net that does unsupervised training, for instance using stacked denoising autoencoders. All of this documentation also explains what neural nets of this type can be used for, e.g. for learning better feature weights, which can then be used as inputs when training a classifier in a supervised fashion.
What I am missing in terms of documentation though are guidelines and code samples that show how to set up a single MultiLayerConfiguration that combines both pretraining and training layers.
Looking at the implementation of org.deeplearning4j.nn.multilayer.MultiLayerNetwork#fit(org.nd4j.linalg.dataset.api.iterator.DataSetIterator) I've tried to figure out how such a configuration might look like. This is what I've come up with:

        MultiLayerConfiguration config = new NeuralNetConfiguration.Builder().seed(1230L)
                .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue)
                .gradientNormalizationThreshold(1.0)
                .iterations(1)
                .momentum(0.5)
                .momentumAfter(Collections.singletonMap(3, 0.9))
                .optimizationAlgo(OptimizationAlgorithm.CONJUGATE_GRADIENT)
                .rmsDecay(0.95)
                .regularization(true)
                .updater(Updater.RMSPROP)
                .l2(0.001)
                .list()
                .layer(0, new AutoEncoder.Builder().nIn(7000).nOut(3000).weightInit(WeightInit.XAVIER).lossFunction(
                        LossFunctions.LossFunction.RMSE_XENT).corruptionLevel(0.3).build())
                .layer(1, new AutoEncoder.Builder().nIn(3000).nOut(1500).weightInit(WeightInit.XAVIER).lossFunction(
                        LossFunctions.LossFunction.RMSE_XENT).corruptionLevel(0.3).build())
                .layer(2, new AutoEncoder.Builder().nIn(1500).nOut(800).weightInit(WeightInit.XAVIER).lossFunction(
                        LossFunctions.LossFunction.RMSE_XENT).corruptionLevel(0.3).build())
                .layer(3,
                        new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD).activation("softmax")
                                .nIn(400)
                                .nOut(300)
                                .build())
                .layer(4, new DenseLayer.Builder().nIn(300).nOut(200).activation("tanh").build())
                .layer(5, new GravesLSTM.Builder().nIn(200).nOut(100).activation("tanh").build())
                .layer(6,
                        new RnnOutputLayer.Builder(LossFunctions.LossFunction.XENT).activation("sigmoid")
                                .nIn(100)
                                .nOut(40)
                                .build())
                .backpropType(BackpropType.Standard)
                .pretrain(true)
                .backprop(true)
                .setInputType(InputType.recurrent(7000))
                .build();

This config would pass the input (which is time-series data) to a stacked denoising autoencoder, reducing the 7000 features to 300 and then use this output as input for a mixed feedforward + lstm network to train a multi-label classifier.

Questions:

                DataSetIterator trainingData = ...; 
                MultiLayerNetwork multiLayerNetwork = new MultiLayerNetwork(config); 
                for (int epoch = 0; epoch < 10; epoch++) { 
                    multiLayerNetwork.fit(trainingData); 
                       trainingData.reset(); 
                } 

This leads me to wonder whether the solution would be to train two separate MultiLayerNetworks (one for pretraining and another for the actual training, each 10 epochs)? This would not be ideal obviously..

Still related to the epoch question: Can the EarlyStoppingTrainer be used with a config that combines pretraining and training?

Thanks in advance!!!

tomthetrainer commented 7 years ago

@daanvdn Thanks for the issue, I agree we could use some documentation on this issue.

The more feedback and input we get from you the better. Have you found any of our documentation pages to be helpful or lacking? If so which pages?

daanvdn commented 7 years ago

hi @tomthetrainer, a while ago @AlexDBlack pointed out to me that my configuration above can't be right because I have an output layer in the middle of the network. Are there any snippets in the examples repo to show how it should be done? thanks

tomthetrainer commented 7 years ago

Hi @daanvdn I could not find any example code. Perhaps @eraly or @turambar have some insight.