OpenGVLab / InternVL

[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4V. 接近GPT-4V表现的可商用开源多模态对话模型
https://internvl.github.io/
MIT License
3.91k stars 299 forks source link

OpenGVLab/InternViT-6B-448px-V1-5 as Zero Shot Image Classification. #147

Open iavinas opened 2 months ago

iavinas commented 2 months ago

Hi,

Thanks for sharing the model and code with us.

I am trying to using Vision Foundation Model for a zero shot classification problem.

It is possible with OpenGVLab/InternVL-14B-224px but I am not able to do with OpenGVLab/InternViT-6B-448px-V1-5.

model = AutoModel.from_pretrained('OpenGVLab/InternViT-6B-448px-V1-5', torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True).cuda().eval()

tokenizer = AutoTokenizer.from_pretrained('OpenGVLab/InternViT-6B-448px-V1-5', use_fast=False, add_eos_token=True, trust_remote_code=True)

Is there anyway to get the tokenizer for OpenGVLab/InternViT-6B-448px-V1-5?

czczup commented 1 month ago

Hi, the OpenGVLab/InternViT-6B-448px-V1-5 is a vision encoder extracted from the pretraining stage of the multimodal large language model (MLLM), OpenGVLab/InternVL-Chat-V1-5, which is trained to be a specialized vision encoder for MLLM and cannot be used directly for zero-shot classification tasks.