THUDM / CogVLM

a state-of-the-art-level open visual language model | 多模态预训练模型
Apache License 2.0
6.05k stars 413 forks source link

[CogVLM-chat-v1.1] LM weights are different with vicuna-7b-v1.5 #466

Open minostauros opened 6 months ago

minostauros commented 6 months ago

While CogVLM is trained, LM weights are fronzen.

From my observation however, the LM weights of cogvlm are different with Vicuna

Vicuna: https://huggingface.co/lmsys/vicuna-7b-v1.5/tree/main CogVLM: cogvlm-chat-v1.1 (both from HF or SAT)

Can I ask why or the proper source of the language model?

image image
antigone660 commented 6 months ago

I have migrated the CogVLM-chat-hf to mindspore and found the modle works well when the input including image and text, but if there is only text query without image, the performance is not so good, it may relate to this issue I guess.

e88da6f7dd39b4446b47f4b5411003b
minostauros commented 6 months ago

I have migrated the CogVLM-chat-hf to mindspore and found the modle works well when the input including image and text, but if there is only text query without image, the performance is not so good, it may relate to this issue I guess.

@antigone660 In text-only mode, the prompt template is different. Did you use the following prompt for text-only query?

https://github.com/THUDM/CogVLM/blob/b37f36bf5df8c0356920b70f21338c1968c20e47/basic_demo/cli_demo_hf.py#L52

In my case, text-only mode works well regardless of this issue

antigone660 commented 6 months ago

@minostauros Thanks for your reply I did not use the template before and it works now : )

JiaQiSJTU commented 3 months ago

So, the LM had also been tuned? I also found that the LM weights of cogvlm2 are different with llama-3-8B-Instruct.