Setting of VideoCutLER's baseline

Thank you for your excellent research.

In the paper for VideoCutLER, the description for the baseline is as follows: ‡: "We train a CutLER [35] model with Mask2Former as a detector on ImageNet-1K, following CutLER’s official training recipe, and use it as a strong baseline."

Could you please clarify if the "strong baseline" mentioned here involves training Mask2Former at the image level only once, or if it involves multi-round self-training? Also, could you specify whether droploss was used or not?

Thanks.

facebookresearch / CutLER

Setting of VideoCutLER's baseline #61