Closed Fohlen closed 10 months ago
@Fohlen In your Lightning code,
And in the raw PyTorch code you are missing the test code:
xor_network.eval()
with torch.no_grad():
test_output = xor_network(Xs)
print(test_output.round())
To make both of them the same, the hyperparameters need to be the same of course. Can you try again? I get the correct predictions (i.e. 0 1 1 0) after these fixes.
In addition, to make it fully deterministic you can set the seed
L.seed_everything(0)
Hi @awaelchli, this indeed produces the correct result. I can get the code to converge correctly within 100 epochs or less with pure Torch, any idea why that wouldn't be the case with lightning?
I can get the code to converge correctly within 100 epochs or less with pure Torch
The code that you posted can't actually converge in 100 epochs. Please share what you changed to make that possible.
Sorry for the imprecise wording. After some experimentation with epochs I could produce the correct result at epoch=250
(not convergence). However, this appears to be extremely sensitive to the seed one uses when training. I find this interesting. According to the Deep Learning book, the correct weights should be learned with a single pass of this network. However, this behaviour is not lightning-specific. Thanks for your help, I will keep on digging in torch to find out the reason for this behaviour 👍
Bug description
Hi, I am trying to train a simple DNN to solve the XOR problem. This can be trivially solved with a pure torch implementation. I cannot replicate the same simple model in lightning. Instead the trained model oscillates between different states, never managing to correctly produce XOR.
What version are you seeing the problem on?
v2.1
How to reproduce the bug
I tried to use Lightning to simplify away the boilerplate code like so: