fyu / drn

Dilated Residual Networks
https://www.vis.xyz/pub/drn
BSD 3-Clause "New" or "Revised" License
1.1k stars 219 forks source link

The mean and std used in preprocess are not same as those used in pretraining the ImageNet model. #24

Closed acgtyrant closed 6 years ago

acgtyrant commented 6 years ago

As far as I know, someone use pretrained model to fine tune in semantic segmentation, they will still use the mean and std from ImageNet which is used to pretrain the pretrained model.

So if we use the models from torchvision to finetune, the official advice:

All pre-trained models expect input images normalized in the same way, i.e. mini-batches of 3-channel RGB images of shape (3 x H x W), where H and W are expected to be at least 224. The images have to be loaded in to a range of [0, 1] and then normalized using mean = [0.485, 0.456, 0.406] and std = [0.229, 0.224, 0.225]. You can use the following transform to normalize:

normalize = transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])

However you use:

{
    "mean": [
        0.290101,
        0.328081,
        0.286964
    ],
    "std": [
        0.182954,
        0.186566,
        0.184475
    ]
}

to normalize, any idea?

fyu commented 6 years ago

I think it makes more sense to use mean and std in the new domain during fine tuning. So the provided stats here are collected from cityscapes data.