Closed loreloc closed 1 year ago
Any idea why I get NaN as output here? (see Pylint report)
Any idea why I get NaN as output here? (see Pytest report)
strange. seems to happen by chance. maybe take a look at reproducibility issues?
Any idea why I get NaN as output here? (see Pytest report)
strange. seems to happen by chance. maybe take a look at reproducibility issues?
How's that possible? We set a seed for the tests. Are we using torch.compile somewhere yet?
How's that possible? We set a seed for the tests. Are we using torch.compile somewhere yet?
See benchmark code. Seeding alone cannot guarantee full reproducibility. However I'm not sure what caused the problem in this case. Maybe inspect with pytorch's anomaly detection?
How's that possible? We set a seed for the tests. Are we using torch.compile somewhere yet?
See benchmark code. Seeding alone cannot guarantee full reproducibility. However I'm not sure what caused the problem in this case. Maybe inspect with pytorch's anomaly detection?
We should use deterministic mode in tests and perhaps do them in no grad mode if we are not testing gradients. Still, having NaN should not happen even in non-deterministic torch mode.
I still get NaN when using deterministic mode.
HEAD detached at 4b355a5
(#64) It works.
HEAD detached at acbbce0
(#69) It works.
HEAD detached at f312558
(#74) It works.
HEAD detached at fb1ef41
(#78) It does not work (getting NaNs).
Removing the padding for the mixing layer (in which cases do we need it?) resolves the issue.
This padding breaks the bookkeeping by changing the shape of the output. This dependence was introduced in #78.
For now we can only comment out this padding. Please change the TODO to state this change of shape breaks bookkeeping and we need to find another way to implement it.
Closes #83 .