TropComplique / FaceBoxes-tensorflow

A fast face detector
MIT License
179 stars 66 forks source link

Question about training #1

Open 96imranahmed opened 6 years ago

96imranahmed commented 6 years ago

Hi there! Thanks for this repo! I'm trying to reimplement everything from scratch in my own repository (https://github.com/96imranahmed/faceboxes_tf) but I've been running into some difficulties and was wondering if you had come across any of these errors before?

Essentially, my main problem is that the model seems to train normally for the first 2000 iterations (with similar magnitudes to the plot that you have in the repo README). However, after a certain point, the regression losses explode - with the model predicting very large values for bounding box offsets + heights/widths (i.e. with values in the 1000s). This causes the model's loss to effectively spiral out of control and the model requires another few thousand iterations to 'reset' back to the same point (before the same problem re-occurs).

I was wondering whether you had experienced this while training your version of Faceboxes? The occurrence of this issue is really stumping me, as the model does seem to be learning relatively well until this failure occurs. It'd be great to hear if you've run into this before/how you got around it!

Thanks! :)

TropComplique commented 6 years ago

Hi. A couple of thoughts:

  1. You don't have any batch normalization. It is almost always easier to train with batch norm layers. I use batch norm after almost all my convolutions.
  2. I have trained SSD-like detectors a lot, and I have never encountered your problem. I believe that your problem is quite unusual.
  3. If your problem reoccurs periodically, then maybe your data shuffling is bad.
  4. That's a bit extreme: https://github.com/96imranahmed/faceboxes_tf/blob/master/data.py#L175. I believe it is not a standard practice to do random rotation augmentation when training object detectors.
96imranahmed commented 6 years ago

Hi @TropComplique, many thanks for your reply! This was very helpful!

(1): Thanks for this - will add batch normalization now! I only have a couple at every CReLu (as in paper), but I will add more!

(2): Thanks for the insight - I would have imagined so. You seem like an SSD guru! I'm still very confused as to why this is the case. I'm currently testing out a hypothesis that it has something to do with the Adam optimizer I'm using - this issue seems relevant and is pretty similar to what I'm experiencing: https://discuss.pytorch.org/t/loss-suddenly-increases-using-adam-optimizer/11338. Because of the denominator, it could be leading to huge gradients, which is throwing off training!

Interestingly, the Adam documentation on Tensorflow states this: "The default value of 1e-8 for epsilon might not be a good default in general. For example, when training an Inception network on ImageNet a current good choice is 1.0 or 0.1. Note that since AdamOptimizer uses the formulation just before Section 2.1 of the Kingma and Ba paper rather than the formulation in Algorithm 1, the "epsilon" referred to here is "epsilon hat" in the paper." Perhaps this may be leading to instability in my training scheme.

(3): I random sample from the dataset at every batch, so I'm not sure data shuffling is the issue

(4): I'm actually looking for in-plane rotation invariance (this is being trained for a particular use-case). With this in mind, I have random rotation augmentation (but thank-you for looking through my code!)

96imranahmed commented 6 years ago

Just by way of update, I've trained ~45k iterations so far and haven't run into the same issue yet! I think the Adam Optimizer might have been causing the problems - will update you when things are fully trained!