Pretrained Models and Training Code for Table 2 Experiments

mmaaz60 commented 2 years ago

Hi,

Is there any plan to release the pretrained models and training code for the Detic model in Table-2 (Open-vocabulary LVIS compared to ViLD)?

Thank you

mmaaz60 commented 2 years ago

Hi @xingyizhou,

Any update on this? Thank you

xingyizhou commented 2 years ago

Hi,

Sorry for my very delayed reply. We don't plan to officially support the ViLD baseline given its super long training time. I am happy to share the model and config links here for reproducibility purposes. Please let me know if this is super urgent (e.g., for an ECCV submission). It takes time for me to dig out the model and verify them ...

Best, Xingyi

mmaaz60 commented 2 years ago

Hi @xingyizhou,

Thank you for your reply. Yes, it is super-urgent. It would be of great help if you can provide the pre-trained models and corresponding configurations for your reproduced baseline and the final Detic model.

xingyizhou commented 2 years ago

Hi,

I have uploaded our ViLD configs for our Box-supervised baseline and Detic.

Our Detic model is here. The model is finetuned on the Box-supervised model as described in our paper. Unfortunately, I don't have the exact BoxSup model we used now. You can train the BoxSup model yourself using the config above or fine-tune from this similar model (class-specific mask head, which has ~0 novel class mask mAP. It should not affect the performance of Detic fine-tuning).

Best, Xingyi

mmaaz60 commented 2 years ago

Thank you @xingyizhou

mmaaz60 commented 2 years ago

Hi @xingyizhou,

I hope you are doing good. I have tried using the BoxSup_ViLD_200e.py config but wasn't able to start the training. I am getting the following error,

I tried replacing line 38 from conv_norm=lambda c: NaiveSyncBatchNorm(c, stats_mode="N") to conv_norm="SyncBN" and then got the following error,

I will appreciate any help in solving this issue. Further, I can understand that you might be really busy these days, so please reply at your convienience.

Thank you

xingyizhou commented 2 years ago

Hi,

What is your pytorch/ cuda/ detectron2 version? What is your command for running the training? Can you run the evaluation script successfully? conv_norm="SyncBN" seems infeasible here as lazy config use the exact instance instead of a string.

Best, Xingyi

mmaaz60 commented 2 years ago

Hi,

What is your pytorch/ cuda/ detectron2 version? What is your command for running the training? Can you run the evaluation script successfully? conv_norm="SyncBN" seems infeasible here as lazy config use the exact instance instead of a string.

Best, Xingyi

Hi,

Thank you for your reply. I was getting the above errors because I was using train_net.py for lazy configs as I was not much familiar with this type of config files. Using lazy_train_net.py solved the above errors.

I can reproduce the Detic results using the provided pretrained weights as can be seen below,

I am now trying to train the Box Supervised model using the config BoxSup_ViLD_200e.py and found that the training is diverging. Currently, I am using a batch size of 128, LR of 0.0004, and training on 16 V100 GPUs (e.g. I just set the num_nodes=2 in the config). What do you think could be the problem? Is it because of the batch size (I'm using 128 instead of 256), but here I decreased the LR accordingly? I can think of trying 256 batch size but that might not be feasible with the resources I have. I will appreciate your comments. Thank you

xingyizhou commented 2 years ago

Hi,

I also met the diverged training, increasing the batchsize generally can help. Here are additional things I tried to stabilize training. However the provided config should be the exact one I used in our paper:

Initialize classification bias so that the default scores after sigmoid is 0.01. Some losses need it, but I didn't use it in the final version.
Longer warmup.
Make sure you used sync BN instead of BN.
Turn off FP16.

Please also refer to the original ViLD repo for the ViLD baseline. For this repo, I highly recommend our baseline which is MUCH resource-friendly (~18 hours one machine) and performs better.

Best, Xingyi

facebookresearch / Detic

Pretrained Models and Training Code for Table 2 Experiments #32