CuriousAI / ladder

Ladder network is a deep learning algorithm that combines supervised and unsupervised learning
MIT License
516 stars 142 forks source link

Theano error: An update must have the same type as the original shared variable #7

Open TeslaH2O opened 9 years ago

TeslaH2O commented 9 years ago

Running the command for the mnist dataset

./run.py train --encoder-layers 1000-500-250-250-250-10 --decoder-spec sig --denoising-cost-x 1000,10,0.1,0.1,0.1,0.1,0.1 --labeled-samples 100 --unlabeled-samples 60000 --seed 1 -- mnist_100_full

I get this error:

ERROR:blocks.main_loop:Error occured during training.

Blocks will attempt to run on_error extensions, potentially saving data, before exiting and reraising the error. Note that the usual after_training extensions will not be run. The original error will be re-raised and also stored in the training log. Press CTRL + C to halt Blocks immediately. Traceback (most recent call last): File "./run.py", line 649, in if train(d) is None: File "./run.py", line 500, in train main_loop.run() File "/home/teslah2o/ladder/venv/local/lib/python2.7/site-packages/blocks/main_loop.py", line 188, in run reraise_as(e) File "/home/teslah2o/ladder/venv/local/lib/python2.7/site-packages/blocks/utils/init.py", line 225, in reraise_as six.reraise(type(new_exc), new_exc, orig_exc_traceback) File "/home/teslah2o/ladder/venv/local/lib/python2.7/site-packages/blocks/main_loop.py", line 164, in run self.algorithm.initialize() File "/home/teslah2o/ladder/venv/local/lib/python2.7/site-packages/blocks/algorithms/init.py", line 224, in initialize self._function = theano.function(self.inputs, [], updates=all_updates) File "/home/teslah2o/ladder/venv/local/lib/python2.7/site-packages/theano/compile/function.py", line 300, in function output_keys=output_keys) File "/home/teslah2o/ladder/venv/local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 488, in pfunc no_default_updates=no_default_updates) File "/home/teslah2o/ladder/venv/local/lib/python2.7/site-packages/theano/compile/pfunc.py", line 216, in rebuild_collect_shared raise TypeError(err_msg, err_sug) TypeError: ('An update must have the same type as the original shared variable (shared_var=f_5_b, shared_var.type=TensorType(float32, vector), update_val=Elemwise{sub,no_inplace}.0, update_val.type=TensorType(float64, vector))., If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.\n\nOriginal exception:\n\tTypeError: An update must have the same type as the original shared variable (shared_var=f_5_b, shared_var.type=TensorType(float32, vector), update_val=Elemwise{sub,no_inplace}.0, update_val.type=TensorType(float64, vector))., If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.', 'If the difference is related to the broadcast pattern, you can call the tensor.unbroadcast(var, axis_to_unbroadcast[, ...]) function to remove broadcastable dimensions.')

Do you know how to fix it?

parry2403 commented 9 years ago

I am also facing same issue .

arasmus commented 9 years ago

No idea how to fix it. Several people has been successfully running the code so it is probably something to do with library configuration. Perhaps Theano versions behave differently, e.g. 0.7.0 vs bleeding edge. Does this help?

fedor-chervinskii commented 8 years ago

I've got the same error, and this solved the problem:

THEANO_FLAGS='floatX=float32' python run.py train ...
hotloo commented 8 years ago

Can you update your local master and try out the new version? Also, could you please also update your python environment according to the environment.yml and then test out the experiments that you have encountered issues?

kleiba commented 8 years ago

Is it necessary to have the exact versions of the dependencies installed, or are newer versions ok as well?

hotloo commented 8 years ago

@kleiba Hi! I would assume that the newer versions of the dependencies, with assumptions that no breaking changes are introduced, should work. However, due to the nature of these compiled code from Theano and other libraries, I would recommend that you can try to use the exact versions.

Could you please also, if possible, share with us your updated environment file if you get it working? Cheers!

kleiba commented 8 years ago

@hotloo Hi! I'm not using conda, but the following are the version numbers of the packages in my local installation. I ran the "MNIST 1000 labels -- Full" example from the README as a test and as far as I can tell, the training went through without any issues (however, interestingly, the test error is even lower than the one reported in one of your papers, 0.75).

dependencies:

hotloo commented 8 years ago

@kleiba Ha! Glad to hear that you got it working!

Indeed, we reported, if I remember correctly, that average of 5 runs on different seeds are reported in the paper. I am closing this ticket now since you have reproduced the results!

kleiba commented 8 years ago

But isn't it a bit strange that the test error varies that much?

hotloo commented 8 years ago

@kleiba True. Would be it possible to do a 10 seed run from 1 to 10? That should tell us how valid the results.

kleiba commented 8 years ago

Sure thing. So, you want me to use 1,2,...,10 as seed values, or 10 random seeds?

hotloo commented 8 years ago

Yeah, 1-10 would be nice. :) On Thu, 8 Sep 2016 at 12:58, Thomas Kleinbauer notifications@github.com wrote:

Sure thing. So, you want me to use 1,2,...,10 as seed values, or 10 random seeds?

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/CuriousAI/ladder/issues/7#issuecomment-245550760, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvx8CB1n_1mhdS_hC4jzqB6UXGlOsJlks5qn9xIgaJpZM4GHTZI .

kleiba commented 8 years ago

Ah, shoot -- I was a bit too fast then. I've started 10 jobs with the following seeds:

1336129658, 2139292564, 1024194972, 1015193191, 755118383, 1238574728, 1490285678, 902708816, 1963117705, 1043170902

I hope that's okay for repeatability, or else I can also do 1,...,10.

hotloo commented 8 years ago

I think it should do its job! Thanks for all the time and effort that you put into this! Cheers! On Thu, 8 Sep 2016 at 16:58, Thomas Kleinbauer notifications@github.com wrote:

Ah, shoot -- I was a bit too fast then. I've started 10 jobs with the following seeds:

1336129658, 2139292564, 1024194972, 1015193191, 755118383, 1238574728, 1490285678, 902708816, 1963117705, 1043170902

I hope that's okay for repeatability, or else I can also do 1,...,10.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/CuriousAI/ladder/issues/7#issuecomment-245605867, or mute the thread https://github.com/notifications/unsubscribe-auth/ABvx8JI4MtIUe1-0JTrQxQgPtlazbissks5qoBSAgaJpZM4GHTZI .

kleiba commented 8 years ago

No way, thank you for your support!

kleiba commented 8 years ago

Okay, so the results are in. As I wrote above, I ran the "MNIST all labels / Full" example from the README file. For the 10 seed runs, I hence used the following command:

run.py train --encoder-layers 1000-500-250-250-250-10 --decoder-spec gauss --denoising-cost-x 1000,1,0.01,0.01,0.01,0.01,0.01 --labeled-samples 60000 --unlabeled-samples 60000 --seed <seed> -- mnist_all_full

where <seed> was one of the ten values posted above. Running run.py evaluate results/xxx on each of the resulting model directories yielded the following test errors:

<seed> Test error
1336129658 0.640000
2139292564 0.670000
1024194972 0.650000
1015193191 0.700000
755118383 0.610000
1238574728 0.760000
1490285678 0.750000
902708816 0.560000
1963117705 0.690000
1043170902 0.710000

Averaged over all runs, this gives us a test error of 0.684.

From the "Semi-Supervised Learning with Ladder Networks" paper, I would have expected something more close to the 0.57 reported on page 10 in Table 1 in that paper.

(I had previously thought that the numbers I got were better than reported (see comments above), but that was my mistake since I had compared my results to the wrong column in said paper.)

Any idea why my numbers are worse? Out of the 10 runs I did, it seems that only one comes close to your 0.57, all others being substantially worse.

Thanks!

hotloo commented 8 years ago

@kleiba Interesting! Thanks for your effort here. Let me double check if we are experiencing some regressions somewhere. I will report back once I have some results.

Chromer163 commented 7 years ago

@kleiba @hotloo emm.. I just don't know how to get the test error, I have run mnist 100 labeled-samples ,but I am not very aware of the meaning of the parameters in the results,such as:

1 Saving to results/mnist_100_full1/trained_params e 150, i 75000:V_C_class 0.0954, V_E 1.43, V_C_de 0.00561 0.0635 0.927 0.365 0.164 0.0549 0.0352, T_C_de 0.00544 0.0608 0.926 0.362 0.162 0.0519 0.0316, T_C_class 0.000141, VF_C_class 0.0944, VF_E 1.41, VF_C_de 0.00561 0.0636 0.927 0.365 0.163 0.0531 0.033 valid_final_error_rate_clean 1.41 Took 55.3 minutes

thanks.