Alpha-VL / ConvMAE

ConvMAE: Masked Convolution Meets Masked Autoencoders
MIT License
483 stars 41 forks source link

How can i train 200 epoches for DET ? #26

Open ross-Hr opened 1 year ago

ross-Hr commented 1 year ago

Hi , I want to train the pretrained model in detectron2 framework for object detection. But the code only train 1 epoch and then ended. Is this a bug ?

ross-Hr commented 1 year ago

When training, UserWarning occurs:

lib/python3.9/site-packages/torch/optim/lr_scheduler.py:129: UserWarning: Detected call oflr_scheduler.step()beforeoptimizer.step(). In PyTorch 1.1.0 and later, you should call them in the opposite order:optimizer.step()beforelr_scheduler.step(). Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate warnings.warn("Detected call oflr_scheduler.step()beforeoptimizer.step(). " I can't find where the ·Warning· code is.

gaopengpjlab commented 1 year ago

Please try this codebase.

https://github.com/OpenGVLab/Official-ConvMAE-Det

ross-Hr commented 1 year ago

So, is there a bug in the code of this library

gaopengpjlab commented 1 year ago

ConvMAE DET is based on MIMDET which is a replication of ViTDet. Official ConvMAE DET is based on Official ViTDet. Official ConvMAE DET can achieve 53.9 mAP, +0.7 improvement compared with ConvMAE DET. The improvement is due to learning rate decay.

gaopengpjlab commented 1 year ago

There is no bug in ConvMAE DET.

ross-Hr commented 1 year ago

Ok. Is it due to the warning "Detected call oflr_scheduler.step() before optimizer.step()" ? Do you have any suggestions for this warning?

/root/anaconda3/envs/eva/lib/python3.9/site-packages/torch/optim/lr_scheduler.py:129: UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.  Failure to do this will result in PyTorch skipping the first value of the learning rate schedule. See more details at https://pytorch.org/docs/stable/optim.html#how-to-adjust-learning-rate
  warnings.warn("Detected call of `lr_scheduler.step()` before `optimizer.step()`. "
ross-Hr commented 1 year ago

It seems the bug of detectron2 ...

gaopengpjlab commented 1 year ago

Maybe there is a bug in the installation of detectron2.

DianCh commented 1 year ago

Hi @gaopengpjlab, thank you for sharing the official repo. Do you mean "ConvMAE DET" here and "Official ConvMAE DET" are identical in model implementation, and the 0.7 mAP improvement is only due to switching to the multi-step scheduler?

gaopengpjlab commented 1 year ago

Official ConvMAE DET is based on official ViTDet while ConvMAE DET is based on unofficial version. I think the main gain is due to learning rate decay.

DianCh commented 1 year ago

@gaopengpjlab Thanks! But I noticed that in ConvMAE DET the window size for Base model is window_size=16, but Official ConvMAE DET has window_size=14. Why this difference? Can I still use the convmae weights (v1 and v2) in Official ConvMAE? Also, how does window_size=14 work with image_size=1024, since it's not multiples of 14? Thanks again!

gaopengpjlab commented 1 year ago

Thanks for noticing this. Unoffical ViTDet use window-size 16 with img-size 1024 while the offical ViTDet released by Kaiming et. al. use window-size 14 with img-size 1024.

Please refer to the following code snippet: https://github.com/facebookresearch/detectron2/blob/333efcb6d0b60d7cceb7afc91bd96315cf211b0a/configs/common/models/mask_rcnn_vitdet.py

Concerning this question "how does window_size=14 work with image_size=1024, since it's not multiples of 14?", Official ViTDet use padding+overlapping window to fit window size 14 with image size 1024.

Best Wishes

gaopengpjlab commented 1 year ago

Please check the original code for correct details.

DianCh commented 1 year ago

@gaopengpjlab Thank you. Another question is that, it looks like relative position embedding is only introduced for downstream detection in addition to absolute pos embed, and the MAE pre-training phase has only absolute pos embed. Can I ask why this design?

gaopengpjlab commented 1 year ago

We follow the design of MAE which only utilize abs pos embed during training. While another BERT-like pretraining BeiT choose absolute position embedding. MAE adopts absolute pos embed for pretraining efficiency as the encoder only process visible tokens.