Closed geweihgg closed 4 months ago
This is a little trick during training. Overfitting can be mitigated by using the drop path inside the ViT even when the ViT is frozen.
If the ViT is not opened for training during the fine-tuning stage, it is still beneficial to set a non-zero drop path. In our experiments, this approach performed better than setting the drop path to zero. Naturally, opening the ViT and using the drop path yields the best results.
code:https://github.com/OpenGVLab/InternVL/blob/main/internvl_chat/internvl/train/internvl_chat_pretrain.py#L597
snapshot:
question: why using eval() for llm? and not for vit? If I don't use eval() method for both model, will I get a more worse result?