Closed lcxrocks closed 6 months ago
Hello, thank you for your question. We have considered distilling InternVL into smaller models. In the future, we plan to conduct relevant distillation experiments to explore its potential as a teacher for smaller models.
Sounds promising. Hoping to see the potential of this vision-language model being applicable in many more situations. Thank you for your quick response.
ViT-22B conducted knowledge distillation experiments (refer to Table 8), demonstrating that it is not only a large-scale model but also an excellent teacher. Has there been any consideration or experiments conducted on whether Intern-VL can be distilled into smaller models, given that it serves as the largest open-source vision/vision-language foundation model to date (and a good alternative to the ViT-22B)? Thank you in advance for your attention.