CUHK-AIM-Group / CLIFF

[ECCV' 24] CLIFF: Continual Latent Diffusion for Open-Vocabulary Object Detection
MIT License
16 stars 0 forks source link

the differences between the three training stages in the `train.sh` file. #4

Closed LTaiQin closed 1 week ago

LTaiQin commented 3 weeks ago

Hello, I'm very interested in your project! However, I'm unsure about the differences between the three training stages in the train.sh file. Could you please explain the differences between obj2txt_stage1, obj2img2txt_stage2, and obj2img2txt_final? Thank you for your help!

wymanCV commented 1 week ago

Hi, thank you so much for your interest in our work! Sorry for the late reply, as I have been busy with some emergent stuff recently.

The different training stages are similar to our baseline framework OVD. obj2txt_stage1 does not use extra class-agnostic proposals for the object-to-image diffusion (similar role as the Region-based Knowledge Distillation in OVD), while obj2img2txt_stage2 uses them. obj2img2txt_final is based on obj2img2txt_stage2 by adding a larger loss weight and a smaller learning rate as a fine-tuning stage for our final model, which makes the novel-class performance more stable in our experiments. Hope this can help, thank you!

Yours Wuyang

LTaiQin commented 1 week ago

Thank you so much for taking the time to respond to my question! Your guidance is extremely helpful, and I appreciate your effort in providing such a clear explanation. Thanks for your support and for sharing your work with the community!