I have two trivial questions for the implementation.
Does the method (Detic) that you've done at deformable detr was just a normal object detction in LVIS? Not a zero-shot setting (rare classes for unseen)?
In the config file for deformable detr, the MAX_ITER assigns 180000, which the epoch can be calculated as 101700x32/180000= (approx.) 18epochs. But in the main paper, it says deformable detr was trained for 48 epochs. Is it right that I have to increase the MAX_ITER for reproduction?
Correct. It's tested in the standard LVIS setting and the CLIP classifier is not used.
I believe it should be 180000 * 32 / 101700, which is approximately 56 epochs. Sorry for saying 48 epochs in the paper, that's not precise and we will rephrase. By the training schedule, we always refer to the original conventions in detectron2, i.e., 16x90K = 1x = 12 epochs (which is not precise for LVIS).
Hi thanks for providing great work
I have two trivial questions for the implementation.