Network does not converge from scratch

ririya commented 4 years ago

I've been trying to train the network from scratch using a custom dataset of around 100K images and 8 classes. usually the network trains until it reaches a training loss of ~23 and then gets stuck there no matter how many epochs run.

The only thing that works is transfer learning from the trained coco models provided and replacing the transformer with a new number of classes and queries (I've been using 20 queries).

I actually got decent results doing the above scheme, but the model is still outperformed by other models such as EfficientDet. So my next step is trying to replace the backbone with an EfficientNet architecture.

The problem is not EfficientNet itself since i am having convergence issues training from scratch even with the original backbones. But I do believe some backbones make it harder to converge.

Here are a couple things I tried:

Importing the transformer part form the pretrained coco models and replacing the backbone, keeping the query and class layers as 100 and 91 and also replacing only those layers with 20 / 8 layers.
Changing the optimizer (Adam, AdamW, RmsProp)
Changing the learning rate from 10e-3 to 10e-6
Changing batch size (This one worked for smaller backbones such as Resnet50)
Using normal batch norm layers instead of frozen batch norm
Changing the image size. The original are 1280 x720 I tried half and quarter size images. I noticed that larger images also make it harder to converge.

I also made a few modifications to the code:

Removed all augmentation
Made all layers learnable

I was able to make Resnet50 converge under certain situations, with large batch size, reduced size images and certain learning rates. However, switching to a larger Resnet or changing any of the parameters breaks the training again.

alcinos commented 4 years ago

Hi @ririya

Thank you for your interest in DETR. I have a couple of questions:

I've been trying to train the network from scratch

Define "from scratch" here. Do you at least use an ImageNet pre-trained backbone? Note that without this, it is not trivial to make the network converge, even on Coco (see #157)

it reaches a training loss of ~23

Training loss is not very informative, what is the mAP and how does it compare to your EfficientDet baseline?

ririya commented 4 years ago

Hi @alcinos Thx for replying!

All my backbones are pretrained on imagenet.

Whenever the training gets stuck, the mAP also gets stuck below 0.1.

As of now I was able to make it converge using Resnet101, Resnet101-DC5 and also Resnest101.

I was able to train it using the aforementioned mods. I’m always importing the trained transformer from the Resnet101 model and replacing what I need.

Using all backbones my results are comparable to EfficientDet D1 (around 0.5 mAP on my dataset)

However it still doesnt work with the Efficientnet backbone.

alcinos commented 4 years ago

I haven't experimented with EfficientNet so I can't really offer you any guidance there. It might depend on the exact way it is pre-trained. You could try increasing a bit the backbone learning rate, eg 5e-5 for example and see if it helps.

For the rest, I'm a bit surprised that it is so hard for you to converge with Imagenet pre-trained resnet backbone and scratch transformer. Maybe your data distribution is very different than coco and you need to think about other data-augmentations that may make sense.

Otherwise, I think it is perfectly fine to rely on fine-tuning from a coco-pretrained model as you are doing. I don't really think you need to fiddle with the number of queries though, 100 should be fine.

Best of luck

ririya commented 4 years ago

Thanks @alcinos I’ll try to follow your suggestions. I’l already getting good results but just wondering why it’s so hard to get this working sometimes.

ririya commented 4 years ago

I was finally able to run i with efficientnet. I think there was a problem with the imported imagenet weights.

However resnet101-dc5 still gives me the best results. It is now beating EfficientDet. However inference time is 30 ms more. I've modified efficientnet to include dilations, as they seem to be critical. Anxious for the results.

fmassa commented 4 years ago

@ririya keep us updated! We are doing some preliminary experiments with EfficientNets and they do seem to work fairly well with DETR.

munirfarzeen commented 3 years ago

Hi, @ririya could you share your hyperparameters you use to train with efficientnet as a backbone. That would be great help.

ririya commented 3 years ago

Hi, @ririya could you share your hyperparameters you use to train with efficientnet as a backbone. That would be great help.

@likui01

The only thing I modified was a learning rate of 10-5 for both backbone and detr and I’m using 30 queries because my images dont have a lot of objects. One thing that helped with the convergence was importing the trained transformers from of one the given models and replacing what I needed. I also tried a few different pretrained efficientnet models one of them did not work, maybe there was some problem with the imagenet weights. Hope this helps you.

munirfarzeen commented 3 years ago

@ririya thank you for your reply. I tried changing the learning rate like you suggested but my network is still not learning as you can see in the figure. i am using mobilenet_v2 backbone from pytorch 130854055_417927339404191_6042647866819840149_n

ririya commented 3 years ago

@likui01 I haven't tried mobilenet_v2. Does your training converge using the provided Resnet50 pretrained models?

munirfarzeen commented 3 years ago

@ririya , yes it does converge with resent50, using pretrain weights

facebookresearch / detr

Network does not converge from scratch #169