Oscillating loss on training set

wendlerc commented 10 years ago

Hello,

Currently I am trying to apply the imagenet model to a simple image segmentation task, based on superpixel classification. However when I tune imagenet on the superpixel dataset, it seems like it learns almost nothing even after 50 000 iterations the loss on the training set is around 50%. In this case the database had 154982 foreground and 217205 backgound superpixel images. (superpixel image ~~ superpixel on a black background)

Has anybody an idea what could go wrong?

Oscillating loss Accuracy

Solver parameters

test_iter: 1000
test_interval: 500
base_lr: 0.001
lr_policy: "step"
gamma: 0.1
stepsize: 20000
display: 20
max_iter: 100000
momentum: 0.9
weight_decay: 0.0005
snapshot: 10000

The only thing that I changed in the network definition was the last layer:

layers {
  name: "fc8_aero"
  type: INNER_PRODUCT
  bottom: "fc7"
  top: "fc8_aero"
  blobs_lr: 1
  blobs_lr: 2
  weight_decay: 1
  weight_decay: 0
  inner_product_param {
    **num_output: 2**
    weight_filler {
      type: "gaussian"
      std: 0.01
    }
    bias_filler {
      type: "constant"
      value: 0
    }
  }
}

Best regards, Chris

wendlerc commented 10 years ago

One more question, how can the test accuracy be above 80% when the test loss is still around 40%?

e.g.

I0720 12:19:55.369655  9160 solver.cpp:142] Test score #0: 0.77204
I0720 12:19:55.369725  9160 solver.cpp:142] Test score #1: 0.4762

shelhamer commented 10 years ago

The loss isn't 1 - accuracy, it's the average softmax loss over the data instances (assuming you are training with a softmax loss). You could have good overall accuracy but wildly wrong predictions on the misclassifications that yield higher loss than not.

It's hard to know why your model isn't learning without more details. Please continue the discussion on the caffe-users mailing list. As of the latest release we prefer to keep issues reserved for Caffe development. Thanks!

BVLC / caffe

Oscillating loss on training set #744