Open christianbuck opened 11 years ago
Is the objective function of a NN with 2 hidden nodes and a tanh activation function convex for xor training data?
I don't know whether the XOR is convex.
I tried initializing the NN in a minimum, and indeed it stays there.
I looked at a lot of gradients for the cases where the XOR isn't learnt properly: The gradients of all batches from one iteration seem to cancel out and so the iteration is useless. Also the learning rate gets very super small. Ideas?
Considering bpnn.py consistently finds a correct solution after about 1000 iterations, I guess miniNN should be able to do that as well. :) bpnn finds different solutions by the way, so I guess it's not convex. Also when I think about it, flipping some of the right signs keeps the output of the net invariant.
Is it normal that convergenge for learning XOR is so very dependent on the initial weights?