openai_api_server.py 500 Internal Server Error

hushuitian commented 3 weeks ago

System Info / 系統信息

(1)ubuntu 22.04,python 3.10, cuda 12.1,2 T4 GPUs (2)update tokenization_chatglm.py in local model directory glm-4-9b-chat as in huggingface (3)use latest openai_api_server.py with necessary modification in order to make it work with 2 T4 GPUs and local local model directory glm-4-9b-chat (4) use latest openai_api_request.py

Who can help? / 谁可以帮助到您？

No response

Information / 问题信息

[X] The official example scripts / 官方的示例脚本
[X] My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

(1)python openai_api_server.py (2)python openai_api_request.py

Expected behavior / 期待表现

it should work as before, but openai_api_request.py reported the below issue: ... thread '' panicked at 'byte index 2 is not a char boundary; it is inside '你' (bytes 1..4) of ` 你是一个名为 GLM-4 的人工智能助手。你是基于智谱AI训练的语言模型 GLM-4 模型开发的，你的任务是针对用户的问题和要求提供适当的答复和支持。

get_current_weather

{ "name": "get_current_weather",[...]', src/lib.rs:238:54 note: run withRUST_BACKTRACE=1environment variable to display a backtrace ERROR: Exception in ASGI application Traceback (most recent call last): File "/home/techplus-820/.local/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 411, in run_asgi result = await app( # type: ignore[func-returns-value] File "/home/techplus-820/.local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 69, in __call__ return await self.app(scope, receive, send) File "/home/techplus-820/.local/lib/python3.10/site-packages/fastapi/applications.py", line 1054, in __call__ await super().__call__(scope, receive, send) File "/home/techplus-820/.local/lib/python3.10/site-packages/starlette/applications.py", line 123, in __call__ await self.middleware_stack(scope, receive, send) File "/home/techplus-820/.local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 164, in __call__ await self.app(scope, receive, _send) File "/home/techplus-820/.local/lib/python3.10/site-packages/starlette/middleware/cors.py", line 85, in __call__ await self.app(scope, receive, send) File "/home/techplus-820/.local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 65, in __call__ await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send) File "/home/techplus-820/.local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/techplus-820/.local/lib/python3.10/site-packages/starlette/routing.py", line 756, in __call__ await self.middleware_stack(scope, receive, send) File "/home/techplus-820/.local/lib/python3.10/site-packages/starlette/routing.py", line 776, in app await route.handle(scope, receive, send) File "/home/techplus-820/.local/lib/python3.10/site-packages/starlette/routing.py", line 297, in handle await self.app(scope, receive, send) File "/home/techplus-820/.local/lib/python3.10/site-packages/starlette/routing.py", line 77, in app await wrap_app_handling_exceptions(app, request)(scope, receive, send) File "/home/techplus-820/.local/lib/python3.10/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app await app(scope, receive, sender) File "/home/techplus-820/.local/lib/python3.10/site-packages/starlette/routing.py", line 72, in app response = await func(request) File "/home/techplus-820/.local/lib/python3.10/site-packages/fastapi/routing.py", line 278, in app raw_response = await run_endpoint_function( File "/home/techplus-820/.local/lib/python3.10/site-packages/fastapi/routing.py", line 191, in run_endpoint_function return await dependant.call(**values) File "/home/techplus-820/w/GLM-4/basic_demo/openai_api_server.py", line 381, in create_chat_completion async for response in generate_stream_glm4(gen_params): File "/home/techplus-820/w/GLM-4/basic_demo/openai_api_server.py", line 219, in generate_stream_glm4 async for output in engine.generate(inputs=inputs, sampling_params=sampling_params, request_id=f"{time.time()}"): File "/home/techplus-820/w/vllm-0.4.3/vllm/engine/async_llm_engine.py", line 662, in generate async for output in self._process_request( File "/home/techplus-820/w/vllm-0.4.3/vllm/engine/async_llm_engine.py", line 756, in _process_request stream = await self.add_request( File "/home/techplus-820/w/vllm-0.4.3/vllm/engine/async_llm_engine.py", line 579, in add_request processed_inputs = await self.engine.process_model_inputs_async( File "/home/techplus-820/w/vllm-0.4.3/vllm/engine/async_llm_engine.py", line 261, in process_model_inputs_async prompt_token_ids = await tokenizer.encode_async( File "/home/techplus-820/w/vllm-0.4.3/vllm/transformers_utils/tokenizer_group/tokenizer_group.py", line 64, in encode_async ret = tokenizer.encode(prompt) File "/home/techplus-820/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2629, in encode encoded_inputs = self.encode_plus( File "/home/techplus-820/.local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3037, in encode_plus return self._encode_plus( File "/home/techplus-820/.local/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 719, in _encode_plus first_ids = get_input_ids(text) File "/home/techplus-820/.local/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 686, in get_input_ids tokens = self.tokenize(text, **kwargs) File "/home/techplus-820/.local/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 617, in tokenize tokenized_text.extend(self._tokenize(token)) File "/home/techplus-820/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/tokenization_chatglm.py", line 88, in _tokenize ids = self.tokenizer.encode(text) File "/home/techplus-820/.local/lib/python3.10/site-packages/tiktoken/core.py", line 120, in encode return self._core_bpe.encode(text, allowed_special) pyo3_runtime.PanicException: byte index 2 is not a char boundary; it is inside '你' (bytes 1..4) of 你是一个名为 GLM-4 的人工智能助手。你是基于智谱AI训练的语言模型 GLM-4 模型开发的，你的任务是针对用户的问题和要求提供适当的答复和支持。

get_current_weather

{ "name": "get_current_weather",`[...] INFO: 127.0.0.1:43990 - "POST /v1/chat/completions HTTP/1.1" 500 Internal Server Error

hushuitian commented 3 weeks ago

vllm 0.4.3 conda env is activated and used during the test.

hushuitian commented 2 weeks ago

After (1)the local model directory glm-4-9b-chat is fully pulled from huggingface again and (2)GLM-4 is fully pulled from github again and (3)do pip install -r requirements.txt under GLM-4/basic_demo directory, the bug is gone, so I will close the bug.

THUDM / GLM-4