Failure to reproduce results of prior experiements

orsharir commented 7 years ago

I've tried to reproduce our previous results of networks trained in Caffe, but I cannot get them to converge during training -- the loss function is either stuck or increasing. This seems to be some sort of bug in the current implementation, however, it's a bit difficult understanding where's the fault, given that the tests are running fine.

I'll upload my code later on, while I try to see if I get more specific information on the source of this issue.

elhanan7 commented 7 years ago

Did you rely on unsupervised initialization?

orsharir commented 7 years ago

I've tried both with and without it. We can discussed it in more detail in our meeting. Are you coming today?

orsharir commented 7 years ago

Example in Keras: basic_net_with_keras.py.txt

orsharir commented 7 years ago

This is an updated version of the test script: basic_net_with_keras.py.txt

I've made minor modifications from the last one, such that the exact same configuration (same network, same initialization and same optimization algorithm) works okay under Caffe.

I've also found one bug which might have been a contributing factor (though it doesn't solve the issue): in your Dirichlet initialization code, you've forgot to take the log at the end (as each set of parameters are a probability vector in log-space). I've added my correct version in this file.

orsharir commented 7 years ago

I've also found another issue with unshared regions behavior. I've fixed this locally in the above script, but it should be fixed. See issue #11 for details.

orsharir commented 7 years ago

This is the weights file of a network of the same structure that was trained in caffe: ght_model_train_1_iter_250.caffemodel.zip

orsharir commented 7 years ago

These are the same weights but in numpy format, each saved in their own file: weights.zip

orsharir commented 7 years ago

I've trained the above network for 250 iterations with batch size 100. At the end of the training the loss function was on the order of 0.1~0.3, so expect values on this order (it's not precise because I've forgot to test the network at the end).

elhanan7 commented 7 years ago

After initializing with your weights, the result is the same, no learning. I passed a single example through the network and printed out the mean activations:

2017-06-07 21:06:41.899961: I tensorflow/core/kernels/logging_ops.cc:79] Mean Sim[-37.658592]
2017-06-07 21:06:41.900483: I tensorflow/core/kernels/logging_ops.cc:79] Mean Mex[-15.049721]
2017-06-07 21:06:41.901129: I tensorflow/core/kernels/logging_ops.cc:79] Mean Mex[-66.746323]
2017-06-07 21:06:41.902419: I tensorflow/core/kernels/logging_ops.cc:79] Mean Mex[-275.81625]
2017-06-07 21:06:41.905782: I tensorflow/core/kernels/logging_ops.cc:79] Mean Mex[-1113.9332]
2017-06-07 21:06:41.906367: I tensorflow/core/kernels/logging_ops.cc:79] Mean Mex[-4464.123]

Is this normal?

orsharir commented 7 years ago

The point of initializing with those weights is not to train the network from this point, but simply to test that the forward pass of the network is correct, i.e. you'll need to set the weights and then evaluate the network (no training!) on the dataset (could be just a small subset of course) to make sure the loss is around the levels I've written above.

Try to do that and update me on the results.

orsharir commented 7 years ago

And regarding the activations, it is normal for them to grow to a very large number - this is because of the sum pooling, for example, consider the activations of the similarity (-37), and that it's spatial extent is 16x16, then had we used global sum pooling at that point, we'd get around -9400, which is on the same order as what you get at the last MEX layer -3364.

elhanan7 commented 7 years ago

I evaluated the model with the weights, and it gave really bad results. Turns out that the weights expect the data to be in the range [-1, 1] and what was given is [0, 1]. After fixing that we get:

loss = 0.642
accuracy = 0.83

And still no learning.

BTW, do you gradient clipping in caffe?

orsharir commented 7 years ago

Actually the data should be in the -0.5 to 0.5 range (I thought I did that in the script I've sent you). Also, are these results on the training set or the test set?

And no, I didn't use gradient clipping in Caffe.

Given the above results, I'd assume that the issue is with the gradients. Maybe try to output more detailed statistics on them, i.e. min, max, mean, std, etc. Try to output these statistics for the weights I gave you with no modifications to the weights, and average the results over a few mini batches.

orsharir commented 7 years ago

Hi @elhanan7 Do you have any updates regarding this issue?

elhanan7 commented 7 years ago

I did the numeric vs. computed test for the gradients of the keras network w.r.t. the offsets (of the first mex) They are different, and a subset of the computed gradients is very large. Next steps are to find the specific gradient that is the culprit, and to understand why the tests didn't catch this behaviour

I just saw that you offered to meet on thursday, Do you still want to, maybe in the morning?

gradients.zip

orsharir commented 7 years ago

Thanks for the update. Let's discuss this tomorrow (Thursday) in more detail. Can you meet at 10:30?

elhanan7 commented 7 years ago

Yes, that works

orsharir commented 7 years ago

I've tried to open the gradients files you've attached, but something seems wrong. First, the shapes are (256,1), and I've expected them to be same as the ones from the network. Second, the numeric gradients are simply 0, which seems like a mistake.

elhanan7 commented 7 years ago

About the different size, that is because i removed the similarity layer to make the numeric gradients computation tractable. About the zeroes, maybe I did something wrong when computing the gradients (it is not clear how to do this for a keras model)

orsharir commented 7 years ago

What sharing pattern do you use, and how many instances? Regarding getting 0, it could be that you are not computing the numeric gradients correctly. I suggest you follow the code of Caffe for checking the gradients, look at test_gradient_check_util.hpp for details -- some hints for reading this source code:

Start from CheckGradientSingle.
the blobs_ array is the array of "blobs" object, each representing a set of parameters for a layer (e.g. one for the templates and one for the weights in the case of the Similarity layer). The bottom array is the array of blobs representing the inputs to the layer, and the tops the array of blobs representing the outputs of the layer.
Notice that they define a new "loss" that is used when testing gradients. It is defined as the sum of squares of the outputs of the layer. You could probably define this loss directly in Keras.
Notice that they zero out all the parameters before testing the numeric gradient, but use random values for the input arrays (random gaussian with zero mean and std=1.0).

Also, don't forgot that you should also check the gradient w.r.t. the input to the layer, and not just w.r.t. the parameters.

Hope this helps.

orsharir commented 7 years ago

Extract this zip to the following directory Generative-ConvACs/exp/mnist/ght_model/train inside HUJI-Deep/Generative-ConvACs.

elhanan7 commented 7 years ago

I compiled the Generative-CAC code and ran the 'run.py' file in exp/mnist/ght_model/train The result:

All required datasets are present.
Generating pre-trained model:

All required datasets are present.
Invalid train plan!

Try `python hyper_train.py --help` for more information
Error calling hyper_train script
=============== DONE ===============
Invalid train plan!

Try `python hyper_train.py --help` for more information
Error calling hyper_train script

orsharir commented 7 years ago

I have just tried cloning, compiling, and unzipping training_files.zip myself, and it worked fine. Are you sure you have followed all of the steps (cloning with --recursive etc.)? Just in case it makes a difference, here are my Makefile.config and my .cshrc files.

orsharir commented 7 years ago

Also, have you tried it on one of the school's computers (e.g. gsm)?

elhanan7 commented 7 years ago

it seems that the bug was that i didn't pass the block parameter into the gradients. the tests didn't catch this because also in the tests I didn't pass the block parameter so all tests ran the default [1,1,1] blocks. Now the ght model is able to learn:

loss acc

orsharir commented 7 years ago

That's great news! However, given that it's the second time that there was an issue with passing the correct parameters, I suggest you go through all parameters (for both MEX and Similarity) and double check that they are indeed all correct.

I'll try to run a few more tests myself, and if it all goes well, then I'll notify Nadav that he can start "beta testing" the new framework.

orsharir commented 7 years ago

I've added #15 to help prevent similar kinds of issues in the future, and possibly detect other cases that we are not currently aware of.

I'm currently assuming this issue is fixed, so I'm closing it.

HUJI-Deep / simnets-tf

Failure to reproduce results of prior experiements #8