The mean and std used in preprocess are not same as those used in pretraining the ImageNet model.

As far as I know, someone use pretrained model to fine tune in semantic segmentation, they will still use the mean and std from ImageNet which is used to pretrain the pretrained model.

So if we use the models from torchvision to finetune, the official advice:

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

However you use:

{
    "mean": [
        0.290101,
        0.328081,
        0.286964
    ],
    "std": [
        0.182954,
        0.186566,
        0.184475
    ]
}

to normalize, any idea?

fyu / drn

The mean and std used in preprocess are not same as those used in pretraining the ImageNet model. #24