davidnvq / grit

GRIT: Faster and Better Image-captioning Transformer (ECCV 2022)
185 stars 28 forks source link

Implemention details of pre-training the object detector on 4ds / VG ? #19

Closed miasakachenmo closed 2 years ago

miasakachenmo commented 2 years ago

Hello! I'm keep following you exciting work. I want to know some implement detail about the pre training process of object detector. eg. the class defination of the 1849 classes you mentioned in Additional Details. Is there any plan to release this part of code?

davidnvq commented 2 years ago

Thanks again for the interest. It is indeed already released.

  1. The configs can be found here https://github.com/davidnvq/grit/tree/main/configs/detection.
  2. The training code is written in those files of https://github.com/davidnvq/grit/tree/main/engine, https://github.com/davidnvq/grit/blob/main/train_detector.py.
  3. And definitely, the model implementation is included in https://github.com/davidnvq/grit/tree/main/models/detection.
  4. Datasets, as I remember, are mostly downloaded from or follow the paper VinVL.

Step 1

To train on the Visual Genome dataset for object detection only, I repeated the following scripts on 4 server nodes (each has 8 GPUs -> that's why it has exp.world_size=32. For node i, we need to set exp.rank=i. For example, on the node 0:

python train_detector.py \
exp.save_dir=expdet1 \
exp.ngpus_per_node=8 \
exp.world_size=32 \
exp.rank=0 \
dataset.phase=finetune \
optimizer.batch_size=4 \
optimizer.num_epochs=50 \
'od_dataset@dataset=[vg_train]' \
'od_dataset@dataset_val=[coco_val, vg_val, vg_test]' \
'model.backbone.pre_trained=imagenet' \
'optimizer.lr_drop_epochs=[40]' 

Step 2

After that, we obtained the pretrained object detector. To finetune this model with the attribute classification task along with object detection on the Visual Genome dataset for 5-10 more epochs, we can run the following script:

python train_detector.py \
exp.save_dir=expdet100attr \
exp.ngpus_per_node=8 \
exp.world_size=32 \
exp.rank=0 \
model.det_module.num_queries=100 \
model.backbone.backbone_name='swin_base_win7_384_22k' \
dataset.phase=finetune \
optimizer.batch_size=2 \
optimizer.num_workers=2 \
'od_dataset@dataset=[vg_train]' \
'od_dataset@dataset_val=[coco_val, vg_val, vg_test]' \
exp.checkpoint='/path_to_dir_of_checkpoint_in_step1/checkpoint_last.pth' \
model.has_attr_head=True \ #### <---- NOTE THAT THIS IS ENABLE!!!
model.det_module.loss.attr_loss_coef=5.0 \
'optimizer.lr_drop_epochs=[10]' \
'optimizer.lr=5e-6' \
'optimizer.lr_backbone=5e-6' \
'optimizer.sp_names=[attr_head]' \
'optimizer.sp_lr=1e-4' \
'optimizer.sp_lr_drop_epochs=[5]' \
'optimizer.num_epochs=10' \
'model.backbone.pre_trained=imagenet' \
'optimizer.lr_drop_epochs=[40]'
davidnvq commented 2 years ago

To train with 4 object detection datasets (i.e., COCO, VG, OpenImages, and Object365), simply replace or add the following setting to the above script in step 1:

'od_dataset@dataset=[vg_train, coco_train, objects365, openimages]' \
dataset.vg_train.num_copies=8 \
dataset.coco_train.num_copies=8 \
davidnvq commented 2 years ago

It is very important to noted that the object detection training requires a lot of compute cost. Therefore, I am unable to reproduce the results again after cleaning / refactoring the code (no funding for such the thing T_T).

Some keys in the configs or training code may be modified -> that may lead to some tiny bugs.

Please kindly be aware of this issue, debug with a few training samples (e.g., 64 images), and use it cautiously.

miasakachenmo commented 2 years ago

It actually helps! Thank you!!!

JingyuLi-code commented 2 years ago

Dear author. The object detector missi some important files for training in train_config.yaml. Can you provide the file _train_annlmdb and _coco_vgoiv6class2ind.json on the training of the pre-training object detector?

lmdb_file: ${oc.env:DATA_ROOT}/coco/annotations/train_ann_lmdb label2ind_file: ${oc.env:DATA_ROOT}/coco/annotations/coco_vgoiv6_class2ind.json