Closed OOF-dura closed 3 years ago
I didn't see an improvement in transfer learning tasks when using pre-training with larger resolution.
anyway, in https://github.com/Alibaba-MIIL/ASL/blob/main/MODEL_ZOO.md you can find TResNet pretrain on OpenImages with resolution of 448x448
Thank you! 👍 :D To my best knowledge, finetuning generally refers to using some layers of some networks that have been trained. From my experience, we generally do not change the input layer (such as the size of the convolution kernel). The closer to the input, the more important the parameters. So, should we change the settings such as the size of the convolution kernel for training on a larger image dataset?
Or we just train the network on a larger image dataset without changing any structure of the network (except for the final output layer)?
your last sentence is correct. see an example guide for fine-tuning (first result on google): https://d2l.ai/chapter_computer-vision/fine-tuning.html
Like most modern networks, TResNet is a fully convolutional network, meaning it can accept inputs of any size. so on fine-tuning (to any resolution or dataset you want), we always start from the original network, without any architecture change, and only replace the final fully-connected layer. notice that on fine-tunining, we unfreeze and retrain the entire network. we just start from the original pre-train.
your last sentence is correct. see an example guide for fine-tuning (first result on google): https://d2l.ai/chapter_computer-vision/fine-tuning.html
Like most modern networks, TResNet is a fully convolutional network, meaning it can accept inputs of any size. so on fine-tuning (to any resolution or dataset you want), we always start from the original network, without any architecture change, and only replace the final fully-connected layer. notice that on fine-tunining, we unfreeze and retrain the entire network. we just start from the original pre-train.
Thank you for your kind reply!
Best, Frank
I have been finetuning the TResnet without changing the NN structure on 640*640 images. The problem is, due to my limited memory and the NN structure after the TResnet, the batch size can only be 1-2. It is still ok? Since I saw we are using BatchNormalization.
batch_size<16 is not ideal for working with batchNormalization
do gradient accumulation to simulate larger batch size https://medium.com/huggingface/training-larger-batches-practical-tips-on-1-gpu-multi-gpu-distributed-setups-ec88c3e51255
model.zero_grad() # Reset gradients tensors
for i, (inputs, labels) in enumerate(training_set):
predictions = model(inputs) # Forward pass
loss = loss_function(predictions, labels) # Compute loss function
loss = loss / accumulation_steps # Normalize our loss (if averaged)
loss.backward() # Backward pass
if (i+1) % accumulation_steps == 0: # Wait for several backward steps
optimizer.step() # Now we can do an optimizer step
model.zero_grad() # Reset gradients tensors
if (i+1) % evaluation_steps == 0: # Evaluate the model when we...
evaluate_model() # ...have no gradients accumulated
Typically, ResNets are trained on 224 x 224 images. Is there a pre-trained model on larger images like 720*720?
Any suggestions?