forresti / SqueezeNet

SqueezeNet: AlexNet-level accuracy with 50x fewer parameters
BSD 2-Clause "Simplified" License
2.17k stars 723 forks source link

Fine-tuning SqueezeNet #9

Closed mtngld closed 8 years ago

mtngld commented 8 years ago

First, Thank you for sharing this awesome work,

I am trying to fine tune SqueezeNet to my own dataset (which is basically a subset of ImageNet labels),

Changes made in order to fine tune, inspired by this:

  1. Changed name of conv10 to conv10-new.
  2. Added param block to conv10-new to increase learning rate for this layer:
  param {
    lr_mult: 5
    decay_mult: 1
  }
  param {
    lr_mult: 10
    decay_mult: 0
  }
  1. Changed conv10-new num_output to my own number of classes
  2. Decrease solver base_lr by a factor of 10 to 0.004

(Tried several numbers, so far the above performed best)

While I was able to do it with AlexNet, with SqueezeNet my accuracy is about 20% lower, any tips for fine tuning?

forresti commented 8 years ago

Your approach makes sense to me.

From a very high level, there are two main things that could cause this:

  1. A bug.
  2. Different applications may require different CNN architectures and solvers to achieve high accuracy (here's a study that suggests this is true). I fine-tuned SqueezeNet on the FlickrLogos-32 logo dataset, and SqueezeNet beat this AlexNet baseline by several percentage points.
toby5box commented 6 years ago

I got similar results on my dataset with SqueezeNet 1.0 (84% vs AlexNet's 100%). I stole the parameter settings above.

SqueezeNet only trained for 1500 iterations, but something seems off because accuracy was exactly the same (on 90 examples) for 500, 1000, and 1500 iters.

JesperChristensen89 commented 6 years ago

Were you able to solve these problems? I am seeing a similar issue trying to finetine towards my own data as well.

toby5box commented 6 years ago

No.

toby5box commented 5 years ago

I have a possibly interesting followup to this. Somehow I missed it for a long time, but my finetuned Squeezenet 1.0 classifier never predicts one of my four classes—even over my entire training and validation sets. This looks like a bug to me (I have no idea where, yet), but could also explain the accuracy drop observed by @mtngld and myself.

forresti commented 5 years ago

@toby5box That's interesting. This is what I like to call a "ghost class" -- a category that is in the vocabulary but, for whatever reason, is never predicted. How balanced is your dataset? For example, in your training set, how how many samples (and what percent of the training samples) contain this ghost class.

toby5box commented 5 years ago

My data set is very unbalanced, but the ghost class is only one of three weak ones. The training set (finetuning) distribution is:

     57 0
     63 1
     85 2
   1923 3

The results look like this (run over whole training set):

1916 /   2128 =  90.0%
Pred by class:
[52, 117, 0, 1959]
Correct by class:
[26, 30, 0, 1860]

Run over an equalised 50/50/50/50 set:

91 /     200:    45.5%
Pred by class:
[29, 53, 0, 118]
Correct by class:
[22, 22, 0, 47]

You can see that class 2 is never predicted.

I've been trying to google for leads on this problem, without much success. If you have any more suggestions how "ghost classes" can arise with a network like SqueezeNet 1.0, I'd love to hear them.

Clearly the imbalance is a big problem already, but I'm struggling to think that zero predictions is a normal outcome even under these conditions.