Epiphqny / VisTR

[CVPR2021 Oral] End-to-End Video Instance Segmentation with Transformers
https://arxiv.org/abs/2011.14503
Apache License 2.0
739 stars 95 forks source link

The default training params are different between paper and code. #27

Closed zzzzzz0407 closed 3 years ago

zzzzzz0407 commented 3 years ago

image image For example lr_backbone in paper 1e-4 and in code 1e-5; epoch 10 in paper and in code 18, if we should adjust it as paper or just use the default params to reproduce the results?

Epiphqny commented 3 years ago

Hi @zzzzzz0407, please follow the default params. We will update the paper.

zzzzzz0407 commented 3 years ago

@Epiphqny Thanks for your quick reply. And I notice that you change the num_class from 40 to 41 recently, it is confused as the background_id is already +1 in code https://github.com/Epiphqny/VisTR/blob/master/models/vistr.py#L39

Epiphqny commented 3 years ago

@zzzzzz0407 Following the same practice in https://github.com/facebookresearch/detr/blob/a54b77800eb8e64e3ad0d8237789fcbf2f8350c5/models/detr.py#L305, the num_class corresponds to max_obj_id + 1, the max_obj_id for VIS is 40. Actually, there are two id indexes for background, the id of 0 is used to represent empty box (for box that disappears in some frame) in https://github.com/Epiphqny/VisTR/blob/3f736292330424f53905bdcfb1cdf07cc2902eb5/datasets/ytvos.py#L108, and the 41 is used to represent the original background.

zzzzzz0407 commented 3 years ago

@Epiphqny Okay, I get it.