During the finetuning process, is the language model frozen?

AILab-CVC / YOLO-World

[CVPR 2024] Real-Time Open-Vocabulary Object Detection

https://www.yoloworld.cc

GNU General Public License v3.0

4.43k stars 430 forks source link

During the finetuning process, is the language model frozen? #225

Open gyftsy opened 5 months ago

gyftsy commented 5 months ago

While reading the paper, I noticed that it mentions fine-tuning the language model during the model fine-tuning process, but it appears to be frozen in the config. Is there a discrepancy here? Would making adjustments improve it?

wondervictor commented 5 months ago

For pre-training and fine-tuning on COCO, the text encoder is frozen. For fine-tuning on LVIS-base, the text encoder is not frozen. Fine-tuning text encoder or not depends on the vocabulary size during training.

Unicorn123455678 commented 5 months ago

For pre-training and fine-tuning on COCO, the text encoder is frozen. For fine-tuning on LVIS-base, the text encoder is not frozen. Fine-tuning text encoder or not depends on the vocabulary size during training.

YOLO-WORLD是一个非常有创新性的工作，我想咨询以下问题，如果我将YOLO-WORLD应用于工业领域数据集，我在微调模型时，是采用闭集微调还是开集微调呢，微调时，学习的文本能否不像coco那样是一个标签名，而是形容的一句话，比如“表盘破损的表”？

Unicorn123455678 commented 5 months ago

为了在 COCO 上进行预训练和微调，文本编码器被冻结。为了在 LVIS-base 上进行微调，文本编码器不会被冻结。是否微调文本编码器取决于训练期间的词汇量。

fintinue ji即如何理解finetuning中的这段话

wondervictor commented 5 months ago

Hi @Unicorn123455678

YOLO-WORLD是一个非常有创新性的工作，我想咨询以下问题，如果我将YOLO-WORLD应用于工业领域数据集，我在微调模型时，是采用闭集微调还是开集微调呢，微调时，学习的文本能否不像coco那样是一个标签名，而是形容的一句话，比如“表盘破损的表”？

您好，这个问题根据具体任务来，如果您的数据是一个闭集模型，那可以用固定的text来当成闭集训练，如果是一个开集数据，那可以按照默认的预训练方式来训练。其实两者都是支持，可以根据需求选择。

关于微调

对于微调，我们目前推荐使用Frozen CLIP训练，如果vocabulary size与LVIS（1203个类别）相当或者更大，可以考虑CLIP打开训练。