fizyr / keras-retinanet

Keras implementation of RetinaNet object detection.
Apache License 2.0
4.38k stars 1.96k forks source link

Reason for freeze_bn=True in ResNet backbone #974

Closed huwenjie333 closed 5 years ago

huwenjie333 commented 5 years ago

Hello, I'm wondering what's the reason to set freeze_bn=True when we create the ResNet Backone as shown in this line: https://github.com/fizyr/keras-retinanet/blob/master/keras_retinanet/models/resnet.py#L99

Also I see this repo use the keras-resnet library instead of ResNet model from official Keras Application library here. Is it because of this additional freeze_bn=True argument, which is not included in Keras Application?

ducheng678 commented 5 years ago

+1. I am confused too. Is this setting caused by using pre-trained model ?

hgaiser commented 5 years ago

The implementation of resnet in Keras has two flaws: it doesn't follow the original implementation and there is no resnet101 and resnet152 implementations. I don't remember if it was mentioned in a paper or if it is a difference between Caffe in the original implementation and Keras, but freeze_bn is the default in that case.

It also makes sense: allowing the batchnorm layers to update is only valuable if the batch size of the data being passed through is large. Otherwise it adjusts to specific images each iteration, which gives inaccurate updates. For object detection algorithms, the batch size is usually quite low. Note that "group normalization" is designed to work with smaller batches; which is something that would be nice to have at some point in the future.

huwenjie333 commented 5 years ago

The implementation of resnet in Keras has two flaws: it doesn't follow the original implementation and there is no resnet101 and resnet152 implementations. I don't remember if it was mentioned in a paper or if it is a difference between Caffe in the original implementation and Keras, but freeze_bn is the default in that case.

It also makes sense: allowing the batchnorm layers to update is only valuable if the batch size of the data being passed through is large. Otherwise it adjusts to specific images each iteration, which gives inaccurate updates. For object detection algorithms, the batch size is usually quite low. Note that "group normalization" is designed to work with smaller batches; which is something that would be nice to have at some point in the future.

Thanks so much for the reply, when you say "the implementation of resnet", is the one from Keras Application here? It does provide resnet101 and resnet152, but you are right there is no option to set freeze_bn.

hgaiser commented 5 years ago

Ahh okay didn't know Keras had implemented the other versions by now, thanks for sharing! When this was implemented they only had resnet50.

On Thu, 18 Apr 2019, 19:30 huwenjie333, notifications@github.com wrote:

The implementation of resnet in Keras has two flaws: it doesn't follow the original implementation and there is no resnet101 and resnet152 implementations. I don't remember if it was mentioned in a paper or if it is a difference between Caffe in the original implementation https://github.com/KaimingHe/deep-residual-networks/blob/master/prototxt/ResNet-50-deploy.prototxt and Keras, but freeze_bn is the default in that case.

It also makes sense: allowing the batchnorm layers to update is only valuable if the batch size of the data being passed through is large. Otherwise it adjusts to specific images each iteration, which gives inaccurate updates. For object detection algorithms, the batch size is usually quite low. Note that "group normalization" is designed to work with smaller batches; which is something that would be nice to have at some point in the future.

Thanks so much for the reply, when you say "the implementation of resnet", is the one from Keras Application here https://github.com/keras-team/keras-applications/blob/master/keras_applications/resnet_common.py? It does provide resnet101 and resnet152, but you are right there is no option to set freeze_bn.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/fizyr/keras-retinanet/issues/974#issuecomment-484603256, or mute the thread https://github.com/notifications/unsubscribe-auth/AAFO22XGYYJCJEBXARHHK6DPRCV2NANCNFSM4HGEWGMA .

sudohainguyen commented 5 years ago

The implementation of resnet in Keras has two flaws: it doesn't follow the original implementation and there is no resnet101 and resnet152 implementations. I don't remember if it was mentioned in a paper or if it is a difference between Caffe in the original implementation and Keras, but freeze_bn is the default in that case. It also makes sense: allowing the batchnorm layers to update is only valuable if the batch size of the data being passed through is large. Otherwise it adjusts to specific images each iteration, which gives inaccurate updates. For object detection algorithms, the batch size is usually quite low. Note that "group normalization" is designed to work with smaller batches; which is something that would be nice to have at some point in the future.

Thanks so much for the reply, when you say "the implementation of resnet", is the one from Keras Application here? It does provide resnet101 and resnet152, but you are right there is no option to set freeze_bn.

but currently in keras version 2.2.4 they didn't include them. Those models will be available in next version 2.2.5 as mentioned here

guker commented 5 years ago

The implementation of resnet in Keras has two flaws: it doesn't follow the original implementation and there is no resnet101 and resnet152 implementations. I don't remember if it was mentioned in a paper or if it is a difference between Caffe in the original implementation and Keras, but freeze_bn is the default in that case.

It also makes sense: allowing the batchnorm layers to update is only valuable if the batch size of the data being passed through is large. Otherwise it adjusts to specific images each iteration, which gives inaccurate updates. For object detection algorithms, the batch size is usually quite low. Note that "group normalization" is designed to work with smaller batches; which is something that would be nice to have at some point in the future.

I got it,it‘s great, thanks!

wty-yy commented 9 months ago

This paper (Section 5.2) shows the training result of freezing the related variables in backbone's batch normalization.