Why eval() for llm? - Githubissues

OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型

https://internvl.readthedocs.io/en/latest/

MIT License

6.05k stars 471 forks source link

Why eval() for llm? #277

Closed geweihgg closed 4 months ago

geweihgg commented 5 months ago

code：https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat/internvl/train/internvl_chat_pretrain.py#L597

snapshot：

question: why using eval() for llm? and not for vit? If I don't use eval() method for both model, will I get a more worse result?

czczup commented 5 months ago

This is a little trick during training. Overfitting can be mitigated by using the drop path inside the ViT even when the ViT is frozen.

czczup commented 5 months ago

If the ViT is not opened for training during the fine-tuning stage, it is still beneficial to set a non-zero drop path. In our experiments, this approach performed better than setting the drop path to zero. Naturally, opening the ViT and using the drop path yields the best results.