facebookresearch / Detic

Code release for "Detecting Twenty-thousand Classes using Image-level Supervision".
Apache License 2.0
1.87k stars 209 forks source link

Question of application in Deformable detr #19

Closed jihwanp closed 2 years ago

jihwanp commented 2 years ago

Hi thanks for providing great work

I have two trivial questions for the implementation.

  1. Does the method (Detic) that you've done at deformable detr was just a normal object detction in LVIS? Not a zero-shot setting (rare classes for unseen)?
  2. In the config file for deformable detr, the MAX_ITER assigns 180000, which the epoch can be calculated as 101700x32/180000= (approx.) 18epochs. But in the main paper, it says deformable detr was trained for 48 epochs. Is it right that I have to increase the MAX_ITER for reproduction?
xingyizhou commented 2 years ago

Thank you for your interest.

  1. Correct. It's tested in the standard LVIS setting and the CLIP classifier is not used.
  2. I believe it should be 180000 * 32 / 101700, which is approximately 56 epochs. Sorry for saying 48 epochs in the paper, that's not precise and we will rephrase. By the training schedule, we always refer to the original conventions in detectron2, i.e., 16x90K = 1x = 12 epochs (which is not precise for LVIS).
jihwanp commented 2 years ago

Oh yes sorry it may be 56 epochs. So the epoch that used to report in the paper should be 56 ?

xingyizhou commented 2 years ago

Yes.