Closed mtngld closed 8 years ago
Your approach makes sense to me.
From a very high level, there are two main things that could cause this:
I got similar results on my dataset with SqueezeNet 1.0 (84% vs AlexNet's 100%). I stole the parameter settings above.
SqueezeNet only trained for 1500 iterations, but something seems off because accuracy was exactly the same (on 90 examples) for 500, 1000, and 1500 iters.
Were you able to solve these problems? I am seeing a similar issue trying to finetine towards my own data as well.
No.
I have a possibly interesting followup to this. Somehow I missed it for a long time, but my finetuned Squeezenet 1.0 classifier never predicts one of my four classes—even over my entire training and validation sets. This looks like a bug to me (I have no idea where, yet), but could also explain the accuracy drop observed by @mtngld and myself.
@toby5box That's interesting. This is what I like to call a "ghost class" -- a category that is in the vocabulary but, for whatever reason, is never predicted. How balanced is your dataset? For example, in your training set, how how many samples (and what percent of the training samples) contain this ghost class.
My data set is very unbalanced, but the ghost class is only one of three weak ones. The training set (finetuning) distribution is:
57 0
63 1
85 2
1923 3
The results look like this (run over whole training set):
1916 / 2128 = 90.0%
Pred by class:
[52, 117, 0, 1959]
Correct by class:
[26, 30, 0, 1860]
Run over an equalised 50/50/50/50 set:
91 / 200: 45.5%
Pred by class:
[29, 53, 0, 118]
Correct by class:
[22, 22, 0, 47]
You can see that class 2 is never predicted.
I've been trying to google for leads on this problem, without much success. If you have any more suggestions how "ghost classes" can arise with a network like SqueezeNet 1.0, I'd love to hear them.
Clearly the imbalance is a big problem already, but I'm struggling to think that zero predictions is a normal outcome even under these conditions.
First, Thank you for sharing this awesome work,
I am trying to fine tune SqueezeNet to my own dataset (which is basically a subset of ImageNet labels),
Changes made in order to fine tune, inspired by this:
conv10
toconv10-new
.param
block toconv10-new
to increase learning rate for this layer:conv10-new
num_output
to my own number of classesbase_lr
by a factor of 10 to0.004
(Tried several numbers, so far the above performed best)
While I was able to do it with AlexNet, with SqueezeNet my accuracy is about 20% lower, any tips for fine tuning?