[BUG] Questionable embedding feature shape extracted from Qwen-7B-Chat

xorange commented 1 month ago

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

[X] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

[X] 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

Using this snippet:

from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline

MODEL_PATH = "../Qwen-7B-Chat-Int8"

model = AutoModelForCausalLM.from_pretrained(MODEL_PATH, torch_dtype="auto", device_map="auto", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_PATH, trust_remote_code=True)

print(model.config.hidden_size)

pipe = pipeline('feature-extraction', model=model, tokenizer=tokenizer)

import torch
ft = lambda s : torch.Tensor(pipe(s))

print(ft('man').shape)
print(ft('woman').shape)
print(ft('man and woman').shape)

Yields:

4096
torch.Size([1, 1, 151936])
torch.Size([1, 1, 151936])
torch.Size([1, 3, 151936])

期望行为 | Expected Behavior

4096
torch.Size([1, 1, 4096])
torch.Size([1, 1, 4096])
torch.Size([1, 3, 4096])

I expect above result because the embedding of each token should have the length of hidden_space. While 151936 should be the length of all possible tokens for Qwen.

Am I wrong here and please do correct me if so. Thanks !

复现方法 | Steps To Reproduce

No response

运行环境 | Environment

- OS: x86_64 GNU/Linux Ubuntu 18.04.6 LTS (Bionic Beaver)
- Python: 3.10
- Transformers: 4.39.3
- PyTorch: 2.1.0+cu121
- CUDA : 12.1

备注 | Anything else?

No response

jklj077 commented 1 month ago

Hi!

I think the reason is that Qwen(1.0) uses custom modeling code and it is not fully compatible with transformers, including the pipeline.

Since Qwen(1.0) is no longer actively maintained, I would advise you to migrate to Qwen1.5, which should work out-of-the-box with transformers.

In addition, Qwen models are decoder-only language models or casual language models. They are not trained specifically to extract features. If your task at hand is to obtain sentence, phrase, or word features, I would recommend using embedding models instead.

xorange commented 1 month ago

Thanks for reply !

I'm mainly learning and poking around here and try to analyse/opt about weight 'transformer.wte' and 'lm_head', that's why I'm playing around with embeddings.

Thanks again for your suggestions on Qwen1.5 and embedding models.

QwenLM / Qwen