google-research / big_transfer

Official repository for the "Big Transfer (BiT): General Visual Representation Learning" paper.
https://arxiv.org/abs/1912.11370
Apache License 2.0
1.5k stars 175 forks source link

Input size for r50x1? #23

Closed PenroseTiles closed 4 years ago

PenroseTiles commented 4 years ago

What is the input shape for the Resnet 50x1 model? Is it 224x224x3? Thanks a lot for answering!

anish9 commented 4 years ago

Based on the BiT hyperrule heuristics if the image size is less than 96px then it is resized to 160px and random crops are applied with the size of 128px. If the size is above 96 then images are resized to 448px and random crops of 384px is applied.

PenroseTiles commented 4 years ago

Yes, that is true. But what is the shape of the input layer?

anish9 commented 4 years ago

@PenroseTiles the input has been varied based on the task and the dataset

wyp19930313 commented 4 years ago

def get_resolution(original_resolution): """Takes (H,W) and returns (precrop, crop).""" area = original_resolution[0] original_resolution[1] return (160, 128) if area < 9696 else (512, 480)

known_dataset_sizes = { 'cifar10': (32, 32), 'cifar100': (32, 32), 'oxford_iiit_pet': (224, 224), 'oxford_flowers102': (224, 224), 'imagenet2012': (224, 224), }

precrop, crop = bit_hyperrule.get_resolution_from_dataset(args.dataset)
train_tx = tv.transforms.Compose([
    tv.transforms.Resize((precrop, precrop)),
    tv.transforms.RandomCrop((crop, crop)),
    tv.transforms.RandomHorizontalFlip(),
    tv.transforms.ToTensor(),
    tv.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])
val_tx = tv.transforms.Compose([
    tv.transforms.Resize((crop, crop)),
    tv.transforms.ToTensor(),
    tv.transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)),
])
lucasb-eyer commented 4 years ago

Like @anish9 and @wyp19930313 say, we apply BiT-HyperRule and so the model's input size depends on the original image's resolution.

"the shape of the input layer" is not a valid concept.