facebookresearch / detr

End-to-End Object Detection with Transformers
Apache License 2.0
13.52k stars 2.45k forks source link

Failing to converge on small datasets (Getting zeros on small custom data) #125

Open eslambakr opened 4 years ago

eslambakr commented 4 years ago

❓ How to do something using DETR

Hello All,

I am using DETR on custom data, which contains 2k images for training. I have followed your suggestion proposed in #9 to fine-tune to avoid getting zeros, and I succeeded in achieving comparable accuracy.

But when I tried to train from scratch using the default configuration in main.py, I got zeros for the first 300 epochs until now, so should I wait for more epochs? I think it is so weird So what do u think should I do to be able to get a good accuracy from scratch? Is this a limitation in DETR due to the fact that transformers needs more data to converge? I think we should have some tricks to overcome this :D Another question if there is no hope to train with too small data like this, so what is the minimum size which is proven DETR is working properly with it?

Final note: I am posting new issue as #9 has other questions which are irrelevant so I opened a new issue after reading the whole thread there.

Thanks for sharing your amazing work with the community, I hope to be able to give it back and contribute or add any benefit for your amazing work

alcinos commented 4 years ago

Hi @eslambakr Thanks for your interest in DETR.

2K does sound to small to me, we had success with 10-15k but never tried smaller than that. It's a bit difficult to know what's going on, you could check the predictions to see if the model is doing something at all (both on test and train images). I'd also look at the train/test losses and look for sign of divergence (the most likely explanation here). I'd not rule out the possibility of a bug either, especially if your mAP is exactly 0. Finally, I'd like to point out that the important metric is not really the number of epochs but rather the number of updates. Since your dataset is about 50x smaller than coco, one coco epoch correspond to 50 in your dataset. In other words, it's as if you trained for 6 epochs so far.

Hope this helps.

eslambakr commented 4 years ago

Aha, I understand your point, Thanks for your clarification. I will do more analysis and update you with my observation to benefit others who may stack in the same issue

m-klasen commented 4 years ago

@eslambakr Below some of my experiments with a ~2k images dataset featuring only 4 classes, my best result exceeded detectron2 MaskRCNN ResNet50 FPN by ~5% mAP. If you have further questions please feel free to ask. image

eslambakr commented 4 years ago

Thanks for sharing your results, but I am wondering 1- Is the x-axis in terms of epochs! :D, do you mean you train the model for only 50 epochs? 2- If yes, I wonder how you could achieve that while training from scratch without loading any weights? Do you change the default arguments or what? 3- The class Error is stack at 100. Do you face it, or from your experience, have an explanation for that?

I trained for almost 600 epoch, and I am getting zeros, so It is wired for me though the same data in the same format, I trained other models on it, so I think there is no error in my dataset. And unfortunately, I didn't know I have to set the output_dir to get output logs and weights, I thought It should default was set on so I couldn't draw training curves or test the model on images to debug this behavior, I will rerun another experiment while setting it on and update you, and I will change the number of classes variable to 2 as my data has only one class and change the num_quiries to 30 to make it easier to the model, but I am asking you as your results are impressive.

Thanks in advance.

fmassa commented 4 years ago

@eslambakr I believe @mlk1337 is fine-tuning his model from a pre-trained model on COCO.

m-klasen commented 4 years ago

Hi @eslambakr, i wrote a small gist on how i trained my model with COCO using weights. Here. Hope this helps.

unanan commented 4 years ago

Thanks for sharing your results, but I am wondering 1- Is the x-axis in terms of epochs! :D, do you mean you train the model for only 50 epochs? 2- If yes, I wonder how you could achieve that while training from scratch without loading any weights? Do you change the default arguments or what? 3- The class Error is stack at 100. Do you face it, or from your experience, have an explanation for that?

I trained for almost 600 epoch, and I am getting zeros, so It is wired for me though the same data in the same format, I trained other models on it, so I think there is no error in my dataset. And unfortunately, I didn't know I have to set the output_dir to get output logs and weights, I thought It should default was set on so I couldn't draw training curves or test the model on images to debug this behavior, I will rerun another experiment while setting it on and update you, and I will change the number of classes variable to 2 as my data has only one class and change the num_quiries to 30 to make it easier to the model, but I am asking you as your results are impressive.

Thanks in advance.

If you mean the "too high & abnormal class_error", you can check this reply at #41

eslambakr commented 4 years ago

Hello @alcinos @fmassa , Here you are my results using fine tuning on my custom dataset (2k images) DETR_fine_tunned For me it is a very good results :D

And this is the result of training from scratch, I think it is too bad, so do u have any ideas to make DETR able to converge on small datasets? or from the graphs do u think I have to tune any hyper-parameters? Note: I think I make a mistake in this experiment by keeping args.lr_drop=200, I will rerun after making it 700 DETR

sompjang commented 4 years ago

I have got similar results on small datasets. I tried several different configurations but with no success.

I have trained my dataset on 560 images.

params:

lr_backbone = 1e-5
lr = 1e-2
weight_decay = 1e-4
epochs = 1200
lr_drop = 400
num_queries = 20
num_classes = 1
batch_size = 2

myplot

fmassa commented 4 years ago

@sompjang please try finetuning instead of training from scratch, I'm afraid training on 560 images from scratch might suffer from severe overfitting.

sompjang commented 4 years ago

@sompjang please try finetuning instead of training from scratch, I'm afraid training on 560 images from scratch might suffer from severe overfitting.

@fmassa Thanks for your answer. After fine-tuning results look much better. Are there any recommendations on the dataset size.

guysoft commented 4 years ago

How are you plotting the loss functions?

fmassa commented 4 years ago

@guysoft we have a plotting utility in util/plot_utils.py

cyy21 commented 3 years ago

@sompjang hi, i got the same problem, do you meet the condition that networks output is all the same, and after you follow the finetune, did the output go well? how many class are there in your dataset?

azamshoaib commented 3 years ago

Hi @eslambakr, i wrote a small gist on how i trained my model with COCO using weights. Here. Hope this helps.

@m-klasen The link to your gist is not working. Can you please provide the link. I am training my network from scratch and it is not converging. Any insights to my problem will be very helpful. Thank you.

@eslambakr did you solve your issue with training the network from scratch.

Flyooofly commented 1 year ago

@sompjang请尝试微调而不是从头开始训练,恐怕从头开始训练 560 图像可能会出现严重的过度拟合。

Hello, I used the pre-trained model of the DETR architecture provided in DETReg (https://github.com/amirbar/DETReg) for fine-tuning. The number of fine-tuned images is about 1000, and I fine-tuned the training for 50 epochs(I have modified the num_classes is max id of calsses+1) , but all indicators are still 0, can you help to find out why? thanks. image