Question about epochs and learning rate selection

Hi,

Thanks for making this repo available. I've spent quite a bit of time implementing my own version of FasterRCNN in both PyTorch and TF2/Keras and have noticed that the latter is very difficult to get working with the same degree of performance and reliability as PyTorch.

I notice that you train for a large number of epochs (50) with a small learning rate (1e-5). Was this determined empirically? The original paper uses about 16 epochs (12 at lr=1e-3 and 4 at lr=1e-4), but this was with Caffe. My PyTorch model is capable of converging to the same mean average precision as the paper using 10 + 4 epochs.

However, with TF2/Keras, I've had to use aggressive gradient norm clipping to train at these learning rates and must increase the number of epochs drastically (I am still trying to determine the optimal number but it is at least 30).

Just wanted to confirm that you saw the same sort of instability during training that I saw when following the paper's recipe (lr=1e-3)?

Thank you,

Bart

FurkanOM / tf-faster-rcnn

Question about epochs and learning rate selection #20