As far as I know, someone use pretrained model to fine tune in semantic segmentation, they will still use the mean and std from ImageNet which is used to pretrain the pretrained model.
So if we use the models from torchvision to finetune, the official advice:
All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize:
As far as I know, someone use pretrained model to fine tune in semantic segmentation, they will still use the mean and std from ImageNet which is used to pretrain the pretrained model.
So if we use the models from torchvision to finetune, the official advice:
However you use:
to normalize, any idea?