Is there a pre-trained model on larger size image?

Alibaba-MIIL / TResNet

Official Pytorch Implementation of "TResNet: High-Performance GPU-Dedicated Architecture" (WACV 2021)

Apache License 2.0

469 stars 62 forks source link

Is there a pre-trained model on larger size image? #30

Closed OOF-dura closed 3 years ago

OOF-dura commented 3 years ago

Typically, ResNets are trained on 224 x 224 images. Is there a pre-trained model on larger images like 720*720?

Any suggestions?

mrT23 commented 3 years ago

I didn't see an improvement in transfer learning tasks when using pre-training with larger resolution.

anyway, in https://github.com/Alibaba-MIIL/ASL/blob/main/MODEL_ZOO.md you can find TResNet pretrain on OpenImages with resolution of 448x448

OOF-dura commented 3 years ago

Thank you! 👍 :D To my best knowledge, finetuning generally refers to using some layers of some networks that have been trained. From my experience, we generally do not change the input layer (such as the size of the convolution kernel). The closer to the input, the more important the parameters. So, should we change the settings such as the size of the convolution kernel for training on a larger image dataset?

Or we just train the network on a larger image dataset without changing any structure of the network (except for the final output layer)?

mrT23 commented 3 years ago

your last sentence is correct. see an example guide for fine-tuning (first result on google): https://d2l.ai/chapter_computer-vision/fine-tuning.html

Like most modern networks, TResNet is a fully convolutional network, meaning it can accept inputs of any size. so on fine-tuning (to any resolution or dataset you want), we always start from the original network, without any architecture change, and only replace the final fully-connected layer. notice that on fine-tunining, we unfreeze and retrain the entire network. we just start from the original pre-train.

OOF-dura commented 3 years ago

your last sentence is correct. see an example guide for fine-tuning (first result on google): https://d2l.ai/chapter_computer-vision/fine-tuning.html

Like most modern networks, TResNet is a fully convolutional network, meaning it can accept inputs of any size. so on fine-tuning (to any resolution or dataset you want), we always start from the original network, without any architecture change, and only replace the final fully-connected layer. notice that on fine-tunining, we unfreeze and retrain the entire network. we just start from the original pre-train.

Thank you for your kind reply!

Best, Frank

OOF-dura commented 3 years ago

I have been finetuning the TResnet without changing the NN structure on 640*640 images. The problem is, due to my limited memory and the NN structure after the TResnet, the batch size can only be 1-2. It is still ok? Since I saw we are using BatchNormalization.

mrT23 commented 3 years ago

batch_size<16 is not ideal for working with batchNormalization

do gradient accumulation to simulate larger batch size https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255

model.zero_grad()                                   # Reset gradients tensors
for i, (inputs, labels) in enumerate(training_set):
    predictions = model(inputs)                     # Forward pass
    loss = loss_function(predictions, labels)       # Compute loss function
    loss = loss / accumulation_steps                # Normalize our loss (if averaged)
    loss.backward()                                 # Backward pass
    if (i+1) % accumulation_steps == 0:             # Wait for several backward steps
        optimizer.step()                            # Now we can do an optimizer step
        model.zero_grad()                           # Reset gradients tensors
        if (i+1) % evaluation_steps == 0:           # Evaluate the model when we...
            evaluate_model()                        # ...have no gradients accumulated