THUDM / GLM-4

GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Apache License 2.0
3.55k stars 255 forks source link

modelscope, glm-4v-9b的测试脚本运行异常 #301

Closed chenyangMl closed 1 day ago

chenyangMl commented 4 days ago

System Info / 系統信息

transformers 4.42.3 modelscope 1.16.0 tiktoken 0.4.0 Python 3.10.11 Build cuda_11.3.r11.3/compiler.29745058_0

Who can help? / 谁可以帮助到您?

No response

Information / 问题信息

Reproduction / 复现过程

import torch from PIL import Image from modelscope import AutoModelForCausalLM, AutoTokenizer

device = "cuda"

··· tokenizer = AutoTokenizer.from_pretrained("ZhipuAI/glm-4v-9b", trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( "ZhipuAI/glm-4v-9b", torch_dtype=torch.bfloat16, low_cpu_mem_usage=True, trust_remote_code=True, device_map="auto" ) model = model.eval()

query = '描述这张图片' image = Image.open("kobe.jpeg").convert('RGB') inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": query}], add_generation_prompt=True, tokenize=True, return_tensors="pt", return_dict=True) # chat mode

inputs = inputs.to(device)

gen_kwargs = {"max_length": 2500, "do_sample": True, "top_k": 1} with torch.no_grad(): outputs = model.generate(inputs, gen_kwargs) outputs = outputs[:, inputs['input_ids'].shape[1]:] print(tokenizer.decode(outputs[0])) ··· 运行代码来自 https://modelscope.cn/models/ZhipuAI/glm-4v-9b

报错信息:

index out of bounds: the len is 1 but the index is 1 File "/home/xxx/.conda/envs/llm310/lib/python3.10/site-packages/tiktoken/core.py", line 120, in encode return self._core_bpe.encode(text, allowed_special) File "/home/xxx/.cache/huggingface/modules/transformers_modules/glm-4v-9b/tokenization_chatglm.py", line 140, in build_single_message role_tokens = [self.convert_tokens_to_ids(f"<|{role}|>")] + self.tokenizer.encode(f"{metadata}\n", File "/home/xxx/.cache/huggingface/modules/transformers_modules/glm-4v-9b/tokenization_chatglm.py", line 216, in handle_single_conversation input = self.build_single_message( File "/home/xxx/.cache/huggingface/modules/transformers_modules/glm-4v-9b/tokenization_chatglm.py", line 236, in apply_chat_template result = handle_single_conversation(conversation) File "/mnt1/works/mm-llm/glm-4v-9b-Demo/test.py", line 22, in inputs = tokenizer.apply_chat_template([{"role": "user", "image": image, "content": query}], File "/home/xxx/.conda/envs/llm310/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/xxx/.conda/envs/llm310/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame) return _run_code(code, main_globals, None, pyo3_runtime.PanicException: index out of bounds: the len is 1 but the index is 1

Expected behavior / 期待表现

希望上述示例可以正确运行。

chenyangMl commented 4 days ago

将tiktoken 0.4.0 升级到tiktoken 0.7.0后解决了上述问题。 E1: index out of bounds: the len is 1 but the index is 1

出现新的报错E2如下:

发生异常: ValueError (note: full exception trace is shown but execution is paused at: _run_module_as_main) too many values to unpack (expected 2) File "/home/xxx/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 562, in forward cache_k, cache_v = kv_cache File "/home/xxx/.conda/envs/llm310/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward output = module._old_forward(*args, kwargs) File "/home/xxx/.conda/envs/llm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/xxx/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 693, in forward attention_output, kv_cache = self.self_attention( File "/home/xxx/.conda/envs/llm310/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward output = module._old_forward(args, kwargs) File "/home/xxx/.conda/envs/llm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, kwargs) File "/home/xxx/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 790, in forward layer_ret = layer( File "/home/xxx/.conda/envs/llm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/xxx/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 1056, in forward hidden_states, presents, all_hidden_states, all_self_attentions = self.encoder( File "/home/xxx/.conda/envs/llm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/xxx/.cache/huggingface/modules/transformers_modules/glm-4v-9b/modeling_chatglm.py", line 1188, in forward transformer_outputs = self.transformer( File "/home/xxx/.conda/envs/llm310/lib/python3.10/site-packages/accelerate/hooks.py", line 169, in new_forward output = module._old_forward(*args, kwargs) File "/home/xxx/.conda/envs/llm310/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, *kwargs) File "/home/xxx/.conda/envs/llm310/lib/python3.10/site-packages/transformers/generation/utils.py", line 2651, in _sample outputs = self( File "/home/xxx/.conda/envs/llm310/lib/python3.10/site-packages/transformers/generation/utils.py", line 1914, in generate result = self._sample( File "/home/xxx/.conda/envs/llm310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/mnt1/works/mm-llm/glm-4v-9b-Demo/test.py", line 31, in outputs = model.generate(inputs, gen_kwargs) File "/home/xxx/.conda/envs/llm310/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/xxx/.conda/envs/llm310/lib/python3.10/runpy.py", line 196, in _run_module_as_main (Current frame) return _run_code(code, main_globals, None, ValueError: too many values to unpack (expected 2)

zRzRzRzRzRzRzR commented 4 days ago

降低transformers 到4.40.2,请严格按照req装

chenyangMl commented 1 day ago

降低transformers 到4.40.2,请严格按照req装

多谢解答,降低transformers==4.40.2后可以正常运行了。

我是直接从modelscopre上开始的,没看到requirements.txt。建议里面也加一个,或者readme里面加一个链接也好,减少版本带来的不必要麻烦。

https://github.com/THUDM/GLM-4/blob/main/basic_demo/requirements.txt