what does the constant numbers in dataset.py mean?

chenyuntc / simple-faster-rcnn-pytorch

A simplified implemention of Faster R-CNN that replicate performance from origin paper

Other

3.97k stars 1.14k forks source link

what does the constant numbers in dataset.py mean? #177

Closed 300LiterPropofol closed 4 years ago

300LiterPropofol commented 4 years ago

Hi, Your work is great and very helpful!

I have one confusion, it is for those constant numbers in dataset.py. For example, in the def inverse_normalize, what do the three constant numbers in np.array([122.7717, 115.9465, 102.9801] mean? Are they computed from somewhere?

The same confusion applies to those constant numbers in def pytorch_normalze and def caffee_normalize

link: https://github.com/chenyuntc/simple-faster-rcnn-pytorch/blob/master/data/dataset.py#L14

Thank you very much for helping!!

CharlesPikachu commented 4 years ago

These constant numbers are used to normalize image since the pretrained models adopt the same normalization method. It is defined as follow: image = (image - mean) / std

300LiterPropofol commented 4 years ago

Thanks for your reply, I wonder why the normalization use these specific numbers? If I want to use my custom dataset, should I directly inherit these specific numbers or are they calculated from somewhere?

CharlesPikachu commented 4 years ago

As I mentioned before, if you want to load pretrained classification model weights (e.g. imagenet pretrained resnet101) to initialize the network, and then fine-tune this detection model on your custom dataset, you have better to inherit these numbers for better training results. If you don't load the imagenet pretrained model, you don't have to inherit them. In general, adopting the pretrained classification model weights as the initial weights to train a detection model is a better choice.

300LiterPropofol commented 4 years ago

Thanks for quick reply, I think I got it.

For anyone upcoming who is not sure about this question, these normalization numbers come from ImageNet's a million images. It has nothing to do with which dataset you are currently training on and you shouldn't change these numbers. Also they have nothing to do with which extractor backbone you are using, no matter you want to keep the vgg16 in this repository or want to change it to for example resnet101, no need to change these numbers. To be even flatter, these numbers are preprocessing in order to correspond to ImageNet pretrained model, all preprocessing are the same and fixed, so just leave them be as the original. You can check this link for more reference: https://discuss.pytorch.org/t/how-to-preprocess-input-for-pre-trained-networks/683