Describe the bug
When training pretraining phase on data, in the UnsupervisedLoss calculation, there is a batch normalization per feature.
if a feature std is 0, the normalization uses the mean value.
In my training case, the mean value ended up being small and negative. so that turned my loss to be extremely large in absolute values, and negative. pushing my network to exploit that on features with general negative mean, enlarging the error on those features to reduce the loss.
This from my experience is not intended by the training process. It is not that the negative loss is a problem by definition, but when this behavior's is found, lower loss doesn't mean better reconstruction at the end - which misses the point..
What is the current behavior?
Loss going to -1e17 without good predictions.
If the current behavior is a bug, please provide the steps to reproduce.
Expected behavior
I don't think filling the batch norm with mean values when std is zero is the best choice..
at minimum, we can add torch.abs on the mean values to maintain positive components in the loss, but maybe removing the normalization completly with 1 values is better. I'm not sure, but that should be debatable.
Describe the bug When training pretraining phase on data, in the UnsupervisedLoss calculation, there is a batch normalization per feature. if a feature std is 0, the normalization uses the mean value. In my training case, the mean value ended up being small and negative. so that turned my loss to be extremely large in absolute values, and negative. pushing my network to exploit that on features with general negative mean, enlarging the error on those features to reduce the loss.
This from my experience is not intended by the training process. It is not that the negative loss is a problem by definition, but when this behavior's is found, lower loss doesn't mean better reconstruction at the end - which misses the point..
What is the current behavior? Loss going to -1e17 without good predictions.
If the current behavior is a bug, please provide the steps to reproduce.
Expected behavior
I don't think filling the batch norm with mean values when std is zero is the best choice.. at minimum, we can add torch.abs on the mean values to maintain positive components in the loss, but maybe removing the normalization completly with 1 values is better. I'm not sure, but that should be debatable.
Screenshots
Other relevant information: poetry version:
python version: Operating System: Additional tools:
Additional context