Closed CarlosNacher closed 8 months ago
Thank you for your suggestion. The code you mentioned in lines 144-149 is designed to observe the variation of the loss, so summation or averaging might both be feasible, it's just a matter of scaling.
Thank you for your suggestion. The code you mentioned in lines 144-149 is designed to observe the variation of the loss, so summation or averaging might both be feasible, it's just a matter of scaling.
Totally agree that it is a matter of scale. Usually the average is used to make it more representative of "the loss of the epoch". However, I agree that if all you want to do is look at the variation, it doesn't matter what the scale is.
Thank you very much for the clarification and for your huge contribution
Hi @HuiZhang0812 ,
First of all, thanks a lot for your work and for sharing it here!
I have realized one thing what I think could be a coding error. In
train.py
file,lines 136-141
the train losses are added to the epoch total loss:I suppose this was for after, within
lines 144-149
, compute the mean of those epoch losses dividing them with the number of steps of the epoch (i
). But it seems to have been forgotten, and it is being appended the sum of the epoch loss.I think the following:
Has to be replaced with:
(to take advantage of the
i
variable, obviously there are alternative ways to achieve it).In addition, these loss lists are not saved or logged anywhere afterwards (I guess this is a decision you are aware of).
Again, thank you very much for your great work in developing this SOTA algorithm and even more for sharing it with the community.