SHI-Labs / OneFormer

OneFormer: One Transformer to Rule Universal Image Segmentation, arxiv 2022 / CVPR 2023
https://praeclarumjj3.github.io/oneformer
MIT License
1.48k stars 132 forks source link

Total Loss stops reducing while fine tuning for Instance Segmentation on Custom Dataset. Should I continue to train for more iterations ? #33

Closed vineetsharma14 closed 1 year ago

vineetsharma14 commented 1 year ago

Hello There,

Thanks for sharing the amazing work!

I have been experimenting OneFormer repo since past few days and I am able to run the training (fine-tuning) for Instance Segmentations using Custom Dataset on 1 GPU (Tesla T4) by reducing the image size to 512.

The following are the changes I have made to my configuration.

cfg.INPUT.IMAGE_SIZE = 512 cfg.SOLVER.IMS_PER_BATCH = 1 (Even 16 works)

cfg.MODEL.ROI_HEADS.NUM_CLASSES = <Number Of Classes In My Dataset> cfg.MODEL.RETINANET.NUM_CLASSES = <Number Of Classes In My Dataset> cfg.MODEL.SEM_SEG_HEAD.NUM_CLASSES = <Number Of Classes In My Dataset>

cfg.SOLVER.MAX_ITERA = 40000

with default Base Learning Rate of 0.0001

COCO DINAT Configuration file : oneformer_dinat_large_bs16_100ep.yaml

MODEL WEIGHTS : 150_16_dinat_l_oneformer_coco_100ep.pth

My dataset has approx 10,000 images in the train set.

I found the Training Settings you have used from the Appendix Section of the Paper. So, a batch size of 16 was used for around 90K or more iterations depending on the datasets.

I have trained the model with varying batch sizes but I observe that the total loss stop reducing after few thought iterations.

For example, at batch size of 1, the stating total loss was 87 which reduced to around 13 in 8000 iterations. But after that the train loss oscillates between the values of 9 to 28.

So, with this observation - what should is recommended ?

  1. Should I train the model longer ?
  2. Should I increase the batch size and train longer ?
  3. Should I change the Learning Rate ?
  4. Is there any other modification that is required for fine tuning ?
  5. What is a ball park number of iterations or epochs that one might need for fine tuning this architecture on a train set of 10K images.

Thanks for the help !

praeclarumjj3 commented 1 year ago

Hi @vineetsharma14, I suggest validating your model on a val set before tuning any hyper-parameters.

For example, at batch size of 1, the stating total loss was 87 which reduced to around 13 in 8000 iterations. But after that the train loss oscillates between the values of 9 to 28.

This is not uncommon, the range seems more than expected but it could be due to your dataset. I cannot make any comments without knowing the validation results. Any hyper-parameter tuning also depends on the number of classes in your dataset.

vineetsharma14 commented 1 year ago

Thanks @praeclarumjj3 for the guidance. Really appreciate it !

I will check the dataset.

BhavanaMP commented 8 months ago

Hi @vineetsharma14 ,

Were you able to successfully reduce the training loss after finetuning? I am facing the same pattern in my finetuning experiments.. May I know how are you setting up the learning rate for text mapper while finetuning?