It's probably a rare edge case, but I noticed that in the normalization function a division by zero can occur if minV == maxV in which case the returned loss will become NaN and probably poison the rest of the model even if it just happens once.
So I'd propose to add a safeguard (I tried adding a small epsilon first, but that lead to wrong results)
It's probably a rare edge case, but I noticed that in the normalization function a division by zero can occur if minV == maxV in which case the returned loss will become NaN and probably poison the rest of the model even if it just happens once.
So I'd propose to add a safeguard (I tried adding a small epsilon first, but that lead to wrong results)