I executed "TRUST_REMOTE_CODE=True openllm start /usr1/models/chatglm-6b" and model was loaded successfully, swagger was available, but got error using v1/chat/compelitioins,
Traceback (most recent call last):
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 758, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 299, in handle
await self.app(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 79, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 74, in app
response = await func(request)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/entrypoints/openai.py", line 159, in chat_completions
prompt = llm.tokenizer.apply_chat_template(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/_llm.py", line 411, in tokenizer
self.__llm_tokenizer__ = openllm.serialisation.load_tokenizer(self, **self.llm_parameters[-1])
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/serialisation/__init__.py", line 33, in load_tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained(bentomodel_fs.getsyspath('/'), trust_remote_code=llm.trust_remote_code, **tokenizer_attrs)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 810, in from_pretrained
return tokenizer_class.from_pretrained(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
return cls._from_pretrained(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 196, in __init__
super().__init__(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 367, in __init__
self._add_tokens(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 248, in get_vocab
vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 244, in vocab_size
return self.sp_tokenizer.num_tokens
AttributeError: 'ChatGLMTokenizer' object has no attribute 'sp_tokenizer'
seems that the transformer version required by openllm and transformer version required by chatglm-b are not compatible.
It is recommended to specify the backend explicitly. Cascading backend might lead to unexpected behaviour.
vLLM is not available. Note that PyTorch backend is not as performant as vLLM and you should always consider using vLLM for production.
🚀Tip: run 'openllm build /usr1/models/chatglm-6b --backend pt --serialization legacy' to create a BentoLLM for '/usr1/models/chatglm-6b'
2024-03-05T22:25:42+0800 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "_service:svc" can be accessed at http://localhost:3000/metrics.
2024-03-05T22:25:43+0800 [INFO] [cli] Starting production HTTP BentoServer from "_service:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)
2024-03-05T22:25:49+0800 [WARNING] [runner:llm-chatglm-runner:1] OpenLLM failed to determine compatible Auto classes to load /usr1/models/chatglm-6b. Falling back to 'AutoModel'.
Tip: Make sure to specify 'AutoModelForCausalLM' or 'AutoModelForSeq2SeqLM' in your 'config.auto_map'. If your model type is yet to be supported, please file an issues on our GitHub tracker.
2024-03-05T22:25:51+0800 [INFO] [api_server:llm-chatglm-service:16] 10.143.178.153:49909 (scheme=http,method=POST,path=/v1/chat/completions,type=application/jsonl;charset=utf-8,lengt h=620) (status=500,type=text/plain; charset=utf-8,length=3839) 58.546ms (trace=ccedf8ded80e4d3c5f325e5b67a3562a,span=5478b343dbfe4ef0,sampled=1,service.name=llm-chatglm-service)
2024-03-05T22:25:51+0800 [ERROR] [api_server:llm-chatglm-service:16] Exception in ASGI application
Traceback (most recent call last):
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/bentoml/_internal/server/http/traffic.py", line 26, in __call__
await self.app(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/bentoml/_internal/server/http/instruments.py", line 135, in __call__
await self.app(scope, receive, wrapped_send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 596, in __call__
await self.app(scope, otel_receive, otel_send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/bentoml/_internal/server/http/access.py", line 126, in __call__
await self.app(scope, receive, wrapped_send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 758, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 487, in handle
await self.app(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 758, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 299, in handle
await self.app(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 79, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 74, in app
response = await func(request)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/entrypoints/openai.py", line 159, in chat_completions
prompt = llm.tokenizer.apply_chat_template(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/_llm.py", line 411, in tokenizer
self.__llm_tokenizer__ = openllm.serialisation.load_tokenizer(self, **self.llm_parameters[-1])
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/serialisation/__init__.py", line 33, in load_tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained(bentomodel_fs.getsyspath('/'), trust_remote_code=llm.trust_remote_code, **tokenizer_attrs)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 810, in from_pretrained
return tokenizer_class.from_pretrained(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
return cls._from_pretrained(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 196, in __init__
super().__init__(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 367, in __init__
self._add_tokens(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 248, in get_vocab
vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 244, in vocab_size
return self.sp_tokenizer.num_tokens
AttributeError: 'ChatGLMTokenizer' object has no attribute 'sp_tokenizer'
Name: transformers
Version: 4.38.2
Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow
Home-page: https://github.com/huggingface/transformers
Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors)
Author-email: transformers@huggingface.co
License: Apache 2.0 License
Location: /opt/buildtools/python-3.9.2/lib/python3.9/site-packages
Requires: requests, numpy, regex, huggingface-hub, filelock, safetensors, packaging, tokenizers, pyyaml, tqdm
Required-by: optimum, openllm
Describe the bug
I executed "TRUST_REMOTE_CODE=True openllm start /usr1/models/chatglm-6b" and model was loaded successfully, swagger was available, but got error using v1/chat/compelitioins,
seems that the transformer version required by openllm and transformer version required by chatglm-b are not compatible.
To reproduce
TRUST_REMOTE_CODE=True openllm start /usr1/models/chatglm-6b
Logs
Environment
Name: bentoml Version: 1.1.11 Summary: BentoML: Build Production-Grade AI Applications Home-page: None Author: None Author-email: BentoML Team contact@bentoml.com License: Apache-2.0 Location: /opt/buildtools/python-3.9.2/lib/python3.9/site-packages Requires: prometheus-client, python-json-logger, pathspec, fs, rich, numpy, packaging, opentelemetry-instrumentation, httpx, opentelemetry-util-http, click-option-group, psutil, opentelemetry-api, opentelemetry-instrumentation-aiohttp-client, opentelemetry-semantic-conventions, jinja2, schema, deepmerge, attrs, opentelemetry-sdk, python-dateutil, simple-di, cloudpickle, pip-requirements-parser, uvicorn, requests, aiohttp, watchfiles, circus, python-multipart, inflection, click, pip-tools, pyyaml, starlette, cattrs, nvidia-ml-py, opentelemetry-instrumentation-asgi Required-by: openllm
Name: transformers Version: 4.38.2 Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow Home-page: https://github.com/huggingface/transformers Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors) Author-email: transformers@huggingface.co License: Apache 2.0 License Location: /opt/buildtools/python-3.9.2/lib/python3.9/site-packages Requires: requests, numpy, regex, huggingface-hub, filelock, safetensors, packaging, tokenizers, pyyaml, tqdm Required-by: optimum, openllm
Python 3.9.2
System information (Optional)
No response