Difference between weights and snapshots in your implementation (also w/o freeze model argument)

0xsimulacra commented 4 years ago

Hello,

Can someone explain to me the diferent behaviour that weights and snapshot in your implementations please ?

Basicaly, in the implementation, the only place where waights are saved in when saving a snapshot in the snapshot directory.

With being said, the size of the h5 file is not the same when I run the model with and without freeze backbone. I have a pretrained model on modenet dataset that weights 446MB and is trained using your implementation with Resnet50 as a backbone. When I run your implementation using these weights and not freezing the backbone I get weights saved on the snapshot folder in the same size of the orginal but when I do freeze the backbone the weights that gets saved are almost have the size (in MB) as the original one and the close in size to the weights trained on COCO that you are sharing.

I was wondering if that is the expected behaviour ? and if anybody can explain to me further how to continue traiing from my weights after I did freeze the model and got only the shrinked model weights as a snapshot, because when that happens and Is top training to run it after I run the train file with --snapshot argument and I can see clearly that the model may be missing some weights from the last run because the perfomance is degrading.

Any help with be greatly apreciated.

hgaiser commented 4 years ago

Perhaps the names are poorly chosen, but a weights file is the model itself and only the model. A snapshot also includes the optimizer state. Apparently this is quite big (never looked into this). I trust this answers your question?

0xsimulacra commented 4 years ago

Thank you for your answer, It answer my question partially.

Actually, When not freezing the backbone and doing training the snpashot file you get is very big (~475 MB for Resnet50) But you do the Freezing of the backbone the snapshots you get are significantly smaller (~275 MB for resnet50). I might suppose that the porgram don't save the weights of the backbone if the freeze backbone is active as it assume it will be regular keras.applications.resnet weights.

But what happen when you take a weights file that had backbone trained (not frozen) and you just want to continue training from the same snapshot but this time with the same trained backbone and making it freeze. it will take into consideration that this is resnet50 backbone frozen and won't save its wieghts. I test that : I had a weight file with had backbone trained, and I continued training from that file with frozen backbone, the training loss and the model start adapding and doing well on my train set after some iterations and finally it save the snapshot (wich are significantly smaller in size that the wieghts it was loaded from altough it is the same model with same number of classes) and then I stop training after few epochs, and when I restart training from that saved snapshot and with backbone still frozen, the traing loss and the mode performance is significantly very bad. I assume it tooks the weight for the maskrcnn prediction from the snapshot and it didn't load my saved "finetuned" backbone weights as i was not saved, it did just assume that it was regular resnet50 backbone. Right now, I'm dealing with this by just doing freeze backbone and wait untill convergence then I unfrooze the backbone for further training, but I can't go back to unfeeze the backbone after that. It is just a one way direction.

I hope my point is clear.

fizyr / keras-maskrcnn

Difference between weights and snapshots in your implementation (also w/o freeze model argument) #115