andrewowens / multisensory

Code for the paper: Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
http://andrewowens.com/multisensory/
Apache License 2.0
220 stars 60 forks source link

Why acc doesn't change when shift_model training? #31

Open ruizewang opened 4 years ago

ruizewang commented 4 years ago

Hello, When I train shift_lowfps model, the loss decreases slowly, but acc doesn't change (0.500). Could you give me some advice?

[grad norm:][0.0125109516] Iteration 5500, lr = 1e-03, total:loss: 1.246 reg: 0.041 loss:label: 0.705 acc:label: 0.500, time: 2.978 Iteration 5510, lr = 1e-03, total:loss: 1.244 reg: 0.040 loss:label: 0.704 acc:label: 0.500, time: 2.974 Iteration 5520, lr = 1e-03, total:loss: 1.241 reg: 0.039 loss:label: 0.703 acc:label: 0.500, time: 2.953 Iteration 5530, lr = 1e-03, total:loss: 1.239 reg: 0.037 loss:label: 0.702 acc:label: 0.500, time: 2.960 Iteration 5540, lr = 1e-03, total:loss: 1.238 reg: 0.036 loss:label: 0.701 acc:label: 0.500, time: 2.971 Iteration 5550, lr = 1e-03, total:loss: 1.236 reg: 0.035 loss:label: 0.700 acc:label: 0.500, time: 2.965 Iteration 5560, lr = 1e-03, total:loss: 1.234 reg: 0.034 loss:label: 0.700 acc:label: 0.500, time: 2.961 Iteration 5570, lr = 1e-03, total:loss: 1.232 reg: 0.033 loss:label: 0.699 acc:label: 0.500, time: 2.957 Iteration 5580, lr = 1e-03, total:loss: 1.231 reg: 0.032 loss:label: 0.699 acc:label: 0.500, time: 2.952 Iteration 5590, lr = 1e-03, total:loss: 1.229 reg: 0.031 loss:label: 0.698 acc:label: 0.500, time: 2.967 [grad norm:][0.00501754601] Iteration 5600, lr = 1e-03, total:loss: 1.228 reg: 0.030 loss:label: 0.698 acc:label: 0.500, time: 2.968 Iteration 5610, lr = 1e-03, total:loss: 1.227 reg: 0.030 loss:label: 0.697 acc:label: 0.500, time: 2.960 Iteration 5620, lr = 1e-03, total:loss: 1.225 reg: 0.029 loss:label: 0.697 acc:label: 0.500, time: 2.951 Iteration 5630, lr = 1e-03, total:loss: 1.224 reg: 0.028 loss:label: 0.696 acc:label: 0.500, time: 2.977 Iteration 5640, lr = 1e-03, total:loss: 1.223 reg: 0.027 loss:label: 0.696 acc:label: 0.500, time: 2.973 Iteration 5650, lr = 1e-03, total:loss: 1.222 reg: 0.026 loss:label: 0.696 acc:label: 0.500, time: 2.981

andrewowens commented 4 years ago

Yes, this is a common failure mode! The model also takes a long time to get better-than-chance performance, which can make it look like it's stuck.

ruizewang commented 4 years ago

Yes, this is a common failure mode! The model also takes a long time to get better-than-chance performance, which can make it look like it's stuck.

  • What batch size are you using? Are you training on AudioSet? Note that I trained that model with 3 GPUs, so the effective batch size was 45.
  • The loss values that you should probably be looking at are "loss:label", which is the cross-entropy loss, and "acc" which is the overall accuracy. Here, chance performance would be acc = 0.5 and loss:label = ln(0.5) = 0.693. So, it looks like the model has not yet reached chance performance.
  • In my experiments, the model took something like 2K iterations to reach chance performance (loss:label = 0.693), and 11K iterations to do better than chance (loss:label = 0.692). So, for a long time it looked like the model was stuck at chance.
  • Did you decrease the learning rate? I trained with lr = 1e-2 at the beginning. This might explain why your model is still doing worse than chance at 5K iterations.

Thank you very much for your explanation. This makes me suddenly understand.

andrewowens commented 4 years ago
ruizewang commented 4 years ago

Thanks a lot, Andrew. It is really helpful. šŸ˜ƒ

ruizewang commented 4 years ago

Sorry to bother you, I am here again. When a shift model (e.g., 'net.tf-30000') has been trained, how to use this model for testing? Only set "is_training" as False, and run shift_net.train? But I think maybe there is something else I should do.

class Model:
    def __init__(self, pr, sess, gpus, is_training=False, pr_test=None):
ruizewang commented 4 years ago

Hello Andrew.

andrewowens commented 4 years ago
ruizewang commented 4 years ago
  • Please refer to shift_example.py for an example of testing a trained network.
  • I think the loss is going up when you fine-tune because you are using a higher learning rate and (especially) a smaller batch size. The model starts out better than chance, but the parameters become worse because it's taking large steps (high learning rate) in not-so-great gradient directions (low batch size).
ruizewang commented 4 years ago

There is an example of generating a cam in shift_example.py. But I saw you reported accuracy in the paper ---"Task performance. We found that the model obtained 59.9% accuracy on held-out videos for its alignment task (chance = 50%)." Actually, I want to test the model and get the accuracy result on the test dataset. Do I need to re-write this part code?

This problem is solved. As your suggestion, I add a "test_accuracy" function in "class NetClf". Thanks again, Andrew. šŸ˜€

vuthede commented 3 years ago

Hi @ruizewang, would you mind to share the code you use to create data file for training, I would really appreciate that