gabrieleangeletti / Deep-Learning-TensorFlow

Ready to use implementations of various Deep Learning algorithms using TensorFlow.
http://blackecho.github.io
MIT License
965 stars 376 forks source link

DBN finetuning issue #17

Open mathurakhil opened 8 years ago

mathurakhil commented 8 years ago

Hi, I started exploring the package yesterday. Started with the command line DBN example (same network and hyperparameters) as given in the documentation.

python command_line/run_dbn.py --dataset mnist --main_dir dbn-models --model_name my-deeper-dbn  --verbose 1 --rbm_layers 512,256 --rbm_learning_rate 0.005 --rbm_num_epochs 15 --rbm_batch_size 25 --finetune_batch_size 25 --finetune_learning_rate 0.001 --finetune_num_epochs 10 --finetune_loss_func softmax_cross_entropy --finetune_dropout 0.7 --finetune_act_func relu

Pretraining worked well as in the reconstruction error reduced over the epochs.

Extracting MNIST_data/train-images-idx3-ubyte.gz
Extracting MNIST_data/train-labels-idx1-ubyte.gz
Extracting MNIST_data/t10k-images-idx3-ubyte.gz
Extracting MNIST_data/t10k-labels-idx1-ubyte.gz
Creating stored_models/dbn-models directory to save/restore models
Creating data/dbn-models directory to save model generated data
Creating logs/dbn-models directory to save tensorboard logs
Creating stored_models/dbn-models/rbm-1 directory to save/restore models
Creating data/dbn-models/rbm-1 directory to save model generated data
Creating logs/dbn-models/rbm-1 directory to save tensorboard logs
Creating stored_models/dbn-models/rbm-2 directory to save/restore models
Creating data/dbn-models/rbm-2 directory to save model generated data
Creating logs/dbn-models/rbm-2 directory to save tensorboard logs
Training layer 1...
Tensorboard logs dir for this run is logs/dbn-models/rbm-1/run6
Reconstruction loss at step 0: 0.121678
Reconstruction loss at step 1: 0.103993
Reconstruction loss at step 2: 0.094516
Reconstruction loss at step 3: 0.0880938
Reconstruction loss at step 4: 0.0832195
Reconstruction loss at step 5: 0.0794863
Reconstruction loss at step 6: 0.0764901
Reconstruction loss at step 7: 0.0738373
Reconstruction loss at step 8: 0.0716205
Reconstruction loss at step 9: 0.0696231
Reconstruction loss at step 10: 0.0680134
Reconstruction loss at step 11: 0.0665949
Reconstruction loss at step 12: 0.0651861
Reconstruction loss at step 13: 0.0640741
Reconstruction loss at step 14: 0.0630116
Training layer 2...
Tensorboard logs dir for this run is logs/dbn-models/rbm-2/run5
Reconstruction loss at step 0: 0.172686
Reconstruction loss at step 1: 0.144182
Reconstruction loss at step 2: 0.129113
Reconstruction loss at step 3: 0.119446
Reconstruction loss at step 4: 0.112444
Reconstruction loss at step 5: 0.107379
Reconstruction loss at step 6: 0.103563
Reconstruction loss at step 7: 0.100636
Reconstruction loss at step 8: 0.0982092
Reconstruction loss at step 9: 0.0960361
Reconstruction loss at step 10: 0.0942256
Reconstruction loss at step 11: 0.0926428
Reconstruction loss at step 12: 0.0913576
Reconstruction loss at step 13: 0.0902729
Reconstruction loss at step 14: 0.0891995

But when it came to fine-tuning, the accuracy remained the same across epochs (i.e. network was not trained).

Start deep belief net finetuning...
Tensorboard logs dir for this run is logs/dbn-models/run5
Accuracy at step 0: 0.0958
Accuracy at step 1: 0.0958
Accuracy at step 2: 0.0958
Accuracy at step 3: 0.0958
Accuracy at step 4: 0.0958
Accuracy at step 5: 0.0958
Accuracy at step 6: 0.0958
Accuracy at step 7: 0.0958
....
....
....

I also tried with other parameter configurations and datasets, but had the same problem. Then I tried without pretraining and this time the test accuracy got better as expected.

python command_line/run_dbn.py --dataset mnist --main_dir dbn-models --model_name my-deeper-dbn --verbose 1 --rbm_layers 512,256 --rbm_learning_rate 0.005 --rbm_num_epochs 15 --rbm_batch_size 25 --finetune_batch_size 25 --finetune_learning_rate 0.001 --finetune_num_epochs 10 --finetune_loss_func softmax_cross_entropy --finetune_dropout 0.7 --finetune_act_func relu --do_pretrain False
Accuracy at step 0: 0.5808
Accuracy at step 1: 0.7334
Accuracy at step 2: 0.7884
Accuracy at step 3: 0.8222
Accuracy at step 4: 0.8392
Accuracy at step 5: 0.8544
Accuracy at step 6: 0.8642
Accuracy at step 7: 0.8722
Accuracy at step 8: 0.8782
....
....
....

Did anybody else get the DBN with pretraining to work from command line? I know the parameters in the documentation are not the best ones, but I still expect the network to train. I will now look into the code to find the issue, but any pointers would be appreciated. thanks!

gabrieleangeletti commented 8 years ago

Hi @PrefMiner, I also found the same issue. With pretraining: the RBMs training is fine, but finetuning doesn't work. Without pretraining: works well. I don't know yet what is the reason for this issue, any help would be highly appreciated. Thanks!

twovillage commented 8 years ago

Hi,I think the argument '--finetune_act_func' is 'sigmoid' not 'relu', because nodes of hidden layer define conditional probabilities. The accuracy can grow with this argument. But accuracy growing is very slow and I don't know how to speed up this.

asilvino commented 8 years ago

@twovillage I guess you are right this sigmoid function should be in the last layer.... and explain why the accuracy was not growing...

gabrieleangeletti commented 8 years ago

Just changed the default activation function to sigmoid. The accuracy is growing as expected :+1:

ifishlin commented 8 years ago

Hi @blackecho @twovillage

I replace 'relu' with 'sigmoid' as your suggestion, but the accuracy is still bad. Can you tell me which arguments are incorrect? thanks

python run_dbn.py --dataset mnist --main_dir dbn-models --model_name my-deeper-dbn --verbose 1 --rbm_layers 512,256 --rbm_learning_rate 0.005 --rbm_num_epochs 15 --rbm_batch_size 25 --finetune_batch_size 25 --finetune_learning_rate 0.001 --finetune_num_epochs 10 --finetune_loss_func softmax_cross_entropy --finetune_dropout 0.7 --finetune_act_func sigmoid

Start deep belief net finetuning...
Tensorboard logs dir for this run is logs/dbn-models/run19
Accuracy at step 0: 0.071
Accuracy at step 1: 0.0662
Accuracy at step 2: 0.0682
Accuracy at step 3: 0.0706
Accuracy at step 4: 0.0748
Accuracy at step 5: 0.077
Accuracy at step 6: 0.0792
Accuracy at step 7: 0.0838
Accuracy at step 8: 0.084
Accuracy at step 9: 0.0872
Test set accuracy: 0.0808999985456

Even increase --finetune_num_epochs to 1000, the accuracy is still low.

Accuracy at step 990: 0.1132
Accuracy at step 991: 0.116
Accuracy at step 992: 0.1094
Accuracy at step 993: 0.1092
Accuracy at step 994: 0.1014
Accuracy at step 995: 0.1098
Accuracy at step 996: 0.1126
Accuracy at step 997: 0.1144
Accuracy at step 998: 0.112
Accuracy at step 999: 0.1132
Test set accuracy: 0.109600000083
rahulmitra-kgp commented 8 years ago

@blackecho any idea why the accuracy is not good??? I have used the fine_tune_activation function as "sigmoid"...but still the test acccuracy is 0.106899999082.

fanyike commented 7 years ago

Hi!I didn't use the model in comand line, I import it. When I use dbn with pretraining ,should I call the method 'pretrain' before 'fit'? here is my code

import tensorflow as tf

from yadlt.models.boltzmann import dbn from tensorflow.examples.tutorials.mnist import input_data mnist = input_data.read_data_sets("data/MNIST_data/", one_hot=True) trX, trY, teX, teY = mnist.train.images, mnist.train.labels,\ mnist.test.images, mnist.test.labels

with tf.Session() as sess: dbn = dbn.DeepBeliefNetwork(rbm_layers=[512,256], name='dbn', do_pretrain=True, rbm_num_epochs=[10], rbm_gibbs_k=[10], rbm_gauss_visible=False, rbm_stddev=0.1, rbm_batch_size=[10], rbm_learning_rate=[0.01], finetune_dropout=1, finetune_loss_func='softmax_cross_entropy', finetune_act_func=tf.nn.relu, finetune_opt='sgd', finetune_learning_rate=0.001, finetune_num_epochs=10, finetune_batch_size=20, momentum=0.5)

print(data_train)

dbn.pretrain(trX)

dbn.fit(trX,trY,teX,teY)
gabrieleangeletti commented 7 years ago

If you set do_pretrain=False and you don't call pretrain(), then what you'll have is a standard MLP. With do_pretrain=True and a call to pretrain() you'll perform greedy unsupervised learning of a stack of RBMs, before unrolling them into a MLP.

fanyike commented 7 years ago

@blackecho Many thanks! Excellent work.

fanyike commented 7 years ago

@blackecho Another question: When I use RBM or DBN, should I normalize my input data to zero mean unit variance ? It really confuses me.

gabrieleangeletti commented 7 years ago

Yes, you can both simply normalize in [0, 1] or normalize to zero mean and unit variance. I would go with the latter.

fanyike commented 7 years ago

@blackecho Thank you!

fanyike commented 7 years ago

@blackecho Without using pretraining, the accuracy on MNIST is 0.92, however, when using pretraining, the accuracy is 0.087, any idea? Thanks

gabrieleangeletti commented 7 years ago

Nope, unfortunately I didn't found the cause of this issue yet.

fanyike commented 7 years ago

@blackecho Here I found another code, and it works well. However, I cannot found the difference. https://github.com/xiaohu2015/DeepLearning_tutorials

fanyike commented 7 years ago

@blackecho I guess the problem is the apperance of 'nan'.Here is a print when I using another code, it has the same problem.I think maybe it can give some clue: Starting pretraining...

Pretraing layer 0 Epoch 0 cost: 155.12645916331894
Pretraing layer 0 Epoch 1 cost: 143.80763538707393
Pretraing layer 0 Epoch 2 cost: 141.32982020984997
Pretraing layer 0 Epoch 3 cost: 51.984577359286234
Pretraing layer 0 Epoch 4 cost: 0.0
Pretraing layer 0 Epoch 5 cost: 0.0
Pretraing layer 0 Epoch 6 cost: 0.0
Pretraing layer 0 Epoch 7 cost: 0.0
Pretraing layer 0 Epoch 8 cost: 0.0
Pretraing layer 0 Epoch 9 cost: 0.0
Pretraing layer 1 Epoch 0 cost: nan
Pretraing layer 1 Epoch 1 cost: nan
Pretraing layer 1 Epoch 2 cost: nan
Pretraing layer 1 Epoch 3 cost: nan
Pretraing layer 1 Epoch 4 cost: nan
Pretraing layer 1 Epoch 5 cost: nan
Pretraing layer 1 Epoch 6 cost: nan
Pretraing layer 1 Epoch 7 cost: nan
Pretraing layer 1 Epoch 8 cost: nan
Pretraing layer 1 Epoch 9 cost: nan
Pretraing layer 2 Epoch 0 cost: nan
Pretraing layer 2 Epoch 1 cost: nan
Pretraing layer 2 Epoch 2 cost: nan
Pretraing layer 2 Epoch 3 cost: nan
Pretraing layer 2 Epoch 4 cost: nan
Pretraing layer 2 Epoch 5 cost: nan
Pretraing layer 2 Epoch 6 cost: nan
Pretraing layer 2 Epoch 7 cost: nan
Pretraing layer 2 Epoch 8 cost: nan
Pretraing layer 2 Epoch 9 cost: nan

The pretraining process ran for 3.4709213713039855 minutes

Start finetuning...

Epoch 0 cost: nan, validation accuacy: 0.0957999974489212
Epoch 1 cost: nan, validation accuacy: 0.0957999974489212
Epoch 2 cost: nan, validation accuacy: 0.0957999974489212
Epoch 3 cost: nan, validation accuacy: 0.0957999974489212
Epoch 4 cost: nan, validation accuacy: 0.0957999974489212
Epoch 5 cost: nan, validation accuacy: 0.0957999974489212
Epoch 6 cost: nan, validation accuacy: 0.0957999974489212
Epoch 7 cost: nan, validation accuacy: 0.0957999974489212
Epoch 8 cost: nan, validation accuacy: 0.0957999974489212
Epoch 9 cost: nan, validation accuacy: 0.0957999974489212
xiaohu2015 commented 7 years ago

@blackecho There might be some mistakes about the implement, here : https://github.com/xiaohu2015/DeepLearning_tutorials I can work well for BBRBM but not for GBRBM

gabrieleangeletti commented 7 years ago

@fanyike yes, the problem is probably the appearance of that NaNs, thanks for pointing it out! @xiaohu2015 I will take a look at your implementation and see if I can solve the issue, thanks!

fanyike commented 7 years ago

@blackecho Maybe it is the learning_rate ,which is too big ,making the pre_actiavtion a negative number..

nish21 commented 7 years ago

@blackecho I need a clarification. When training prints Accuracy: 0.93 does that mean a 7% misclassification? Or 93% misclassification? Because the run_summaries in tf_utils.py says mean error is returned.

xiaoruishan commented 5 years ago

@blackecho Without using pretraining, the accuracy on MNIST is 0.92, however, when using pretraining, the accuracy is 0.087, any idea? Thanks

Hi. Has this problem been solved? I have encountered the same problem now.