AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection
https://www.yoloworld.cc
GNU General Public License v3.0
4.43k stars 430 forks source link

During the finetuning process, is the language model frozen? #225

Open gyftsy opened 5 months ago

gyftsy commented 5 months ago

While reading the paper, I noticed that it mentions fine-tuning the language model during the model fine-tuning process, but it appears to be frozen in the config. Is there a discrepancy here? Would making adjustments improve it?

wondervictor commented 5 months ago

For pre-training and fine-tuning on COCO, the text encoder is frozen. For fine-tuning on LVIS-base, the text encoder is not frozen. Fine-tuning text encoder or not depends on the vocabulary size during training.

Unicorn123455678 commented 5 months ago

For pre-training and fine-tuning on COCO, the text encoder is frozen. For fine-tuning on LVIS-base, the text encoder is not frozen. Fine-tuning text encoder or not depends on the vocabulary size during training.

YOLO-WORLD是一个非常有创新性的工作,我想咨询以下问题,如果我将YOLO-WORLD应用于工业领域数据集,我在微调模型时,是采用闭集微调还是开集微调呢,微调时,学习的文本能否不像coco那样是一个标签名,而是形容的一句话,比如“表盘破损的表”?

Unicorn123455678 commented 5 months ago

为了在 COCO 上进行预训练和微调,文本编码器被冻结。为了在 LVIS-base 上进行微调,文本编码器不会被冻结。是否微调文本编码器取决于训练期间的词汇量。

fintinue ji即如何理解finetuning中的这段话

wondervictor commented 5 months ago

Hi @Unicorn123455678

YOLO-WORLD是一个非常有创新性的工作,我想咨询以下问题,如果我将YOLO-WORLD应用于工业领域数据集,我在微调模型时,是采用闭集微调还是开集微调呢,微调时,学习的文本能否不像coco那样是一个标签名,而是形容的一句话,比如“表盘破损的表”?

您好,这个问题根据具体任务来,如果您的数据是一个闭集模型,那可以用固定的text来当成闭集训练,如果是一个开集数据,那可以按照默认的预训练方式来训练。其实两者都是支持,可以根据需求选择。

关于微调

对于微调,我们目前推荐使用Frozen CLIP训练,如果vocabulary size与LVIS(1203个类别)相当或者更大,可以考虑CLIP打开训练。