I have some questions about loss and learning rate.
In the region_layer.py of marvis/yolov2,
all losses[cls, conf, etc.] are summed, learning rate in train.py are divided into batch_size.
Just my guess, it is equal the back propagation as follow:
w` = update weight
w = current weight
lr = learning rate
origin : w= w - lr * dw(average of batch) marvis : w = w - lr /batch * dw(sum of batch)
In your region_layer.py,
losses is divided by nB and also lr of optimizer is divided by batch_size in the train.py
If possible, would you tell me about this intention?
lr/batch is the decaying equation along to the number of seen data. Losses are normalized by batch size (nB). Lr/batch is not related to loss sum. it is just adjusted along to number of training samples.
appreciate of this great work!!
I have some questions about loss and learning rate.
In the region_layer.py of marvis/yolov2, all losses[cls, conf, etc.] are summed, learning rate in train.py are divided into batch_size. Just my guess, it is equal the back propagation as follow: w` = update weight w = current weight lr = learning rate
In your region_layer.py, losses is divided by nB and also lr of optimizer is divided by batch_size in the train.py
If possible, would you tell me about this intention?