Losses are not normalized over the batch. Though it doesn't affect the results, if someone wants to experiment with different parameters, be aware that increasing batch size will increase loss as well. I also think normalization is necessary for a correct interpretation of the loss values.
Losses are not normalized over the batch. Though it doesn't affect the results, if someone wants to experiment with different parameters, be aware that increasing batch size will increase loss as well. I also think normalization is necessary for a correct interpretation of the loss values.