OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
https://internvl.readthedocs.io/en/latest/
MIT License
5.63k stars 439 forks source link

大佬,如何重新训练InternViT‑300M‑448px模型,有开源代码吗 #563

Open wuxiaolianggit opened 3 weeks ago

wuxiaolianggit commented 3 weeks ago

Checklist

Describe the bug

大佬,如何训练InternViT‑300M‑448px模型,有开源代码吗

Reproduction

大佬,如何训练InternViT‑300M‑448px模型,有开源代码吗

Environment

大佬,如何训练InternViT‑300M‑448px模型,有开源代码吗

Error traceback

大佬,如何训练InternViT‑300M‑448px模型,有开源代码吗
wuxiaolianggit commented 3 weeks ago

大佬,如何重新训练InternViT‑300M‑448px模型,有开源代码吗 @lvhan028 @shepnerd @whai362 @liu-zhy

czczup commented 3 weeks ago

你好,我们的InternViT‑300M‑448px是从InternViT-6B-448px-V1-5中蒸馏出来的,这块的代码暂时还没有放。

wuxiaolianggit commented 2 weeks ago

大佬,您好,如果想重新训练InternViT‑6B‑448px‑V1‑5模型,有开源代码吗 @czczup

Weiyun1025 commented 2 weeks ago

InternViT第一阶段的训练是基于CLIP的对比学习Loss,相关代码可以参考OpenCLIP。由于我们当时用的那套codebase还没有做整理,所以暂时没有计划realse出来。

如果是希望基于多模态对话任务重新训练,可以参考我们的预训练代码,预训练代码和微调代码都是一样的,只是修改了数据,以及增加了数据切片防止爆内存。

wuxiaolianggit commented 2 weeks ago

大佬,我使用internvl-g训练InternVL2-1B模型,一直出现错误,如下: Traceback (most recent call last): File "/home/duolun/work1/llm/InternVL-main/internvl_g/internvl/train/intervl_stage1_fintune.py", line 286, in main() File "/home/duolun/work1/llm/InternVL-main/internvl_g/internvl/train/intervl_stage1_fintune.py", line 199, in main model = InternVL_C.from_pretrained( File "/home/duolun/.conda/envs/transgpt/lib/python3.10/site-packages/transformers/modeling_utils.py", line 3594, in from_pretrained model = cls(config, *model_args, model_kwargs) File "/home/duolun/.conda/envs/transgpt/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 506, in wrapper f(module, *args, *kwargs) File "/home/duolun/.conda/envs/transgpt/lib/python3.10/site-packages/deepspeed/runtime/zero/partition_parameters.py", line 506, in wrapper f(module, args, kwargs) File "/home/duolun/work1/llm/InternVL-main/internvl_g/internvl/model/internvl_stage2_retrieval/modeling_internvl.py", line 256, in init self.vision_model.resize_pos_embeddings( File "/home/duolun/work1/llm/InternVL-main/internvl_g/internvl/model/internvl_stage2_retrieval/modeling_intern_vit.py", line 294, in resize_posembeddings , num_positions, embed_dim = pos_emb.shape ValueError: not enough values to unpack (expected 3, got 1) 但是我查看了代码,InternVisionModel模型初始化的时候,self.embeddings = InternVisionEmbeddings(config)也会初始化,但是调用的时候self.embeddings,他的维度就是 torch.Size([0]),正确维度应该是torch.Size([1, 1025, 1024])。

@czczup 大佬,您知道这是什么问题吗?@