bug: start chatglm-6b locally err

Describe the bug

I executed "TRUST_REMOTE_CODE=True openllm start /usr1/models/chatglm-6b" and model was loaded successfully, swagger was available, but got error using v1/chat/compelitioins,

Traceback (most recent call last):
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
    await self.app(scope, receive, _send)
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
    await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 758, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 778, in app
    await route.handle(scope, receive, send)
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 299, in handle
    await self.app(scope, receive, send)
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 79, in app
    await wrap_app_handling_exceptions(app, request)(scope, receive, send)
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
    raise exc
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
    await app(scope, receive, sender)
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 74, in app
    response = await func(request)
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/entrypoints/openai.py", line 159, in chat_completions
    prompt = llm.tokenizer.apply_chat_template(
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/_llm.py", line 411, in tokenizer
    self.__llm_tokenizer__ = openllm.serialisation.load_tokenizer(self, **self.llm_parameters[-1])
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/serialisation/__init__.py", line 33, in load_tokenizer
    tokenizer = transformers.AutoTokenizer.from_pretrained(bentomodel_fs.getsyspath('/'), trust_remote_code=llm.trust_remote_code, **tokenizer_attrs)
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 810, in from_pretrained
    return tokenizer_class.from_pretrained(
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
    return cls._from_pretrained(
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
    tokenizer = cls(*init_inputs, **init_kwargs)
  File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 196, in __init__
    super().__init__(
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 367, in __init__
    self._add_tokens(
  File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
    current_vocab = self.get_vocab().copy()
  File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 248, in get_vocab
    vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
  File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 244, in vocab_size
    return self.sp_tokenizer.num_tokens
AttributeError: 'ChatGLMTokenizer' object has no attribute 'sp_tokenizer'

seems that the transformer version required by openllm and transformer version required by chatglm-b are not compatible.

To reproduce

TRUST_REMOTE_CODE=True openllm start /usr1/models/chatglm-6b

Logs

It is recommended to specify the backend explicitly. Cascading backend might lead to unexpected behaviour.
vLLM is not available. Note that PyTorch backend is not as performant as vLLM and you should always consider using vLLM for production.
🚀Tip: run 'openllm build /usr1/models/chatglm-6b --backend pt --serialization legacy' to create a BentoLLM for '/usr1/models/chatglm-6b'
2024-03-05T22:25:42+0800 [INFO] [cli] Prometheus metrics for HTTP BentoServer from "_service:svc" can be accessed at http://localhost:3000/metrics.
2024-03-05T22:25:43+0800 [INFO] [cli] Starting production HTTP BentoServer from "_service:svc" listening on http://0.0.0.0:3000 (Press CTRL+C to quit)
2024-03-05T22:25:49+0800 [WARNING] [runner:llm-chatglm-runner:1] OpenLLM failed to determine compatible Auto classes to load /usr1/models/chatglm-6b. Falling back to 'AutoModel'.
Tip: Make sure to specify 'AutoModelForCausalLM' or 'AutoModelForSeq2SeqLM' in your 'config.auto_map'. If your model type is yet to be supported, please file an issues on our GitHub        tracker.
2024-03-05T22:25:51+0800 [INFO] [api_server:llm-chatglm-service:16] 10.143.178.153:49909 (scheme=http,method=POST,path=/v1/chat/completions,type=application/jsonl;charset=utf-8,lengt       h=620) (status=500,type=text/plain; charset=utf-8,length=3839) 58.546ms (trace=ccedf8ded80e4d3c5f325e5b67a3562a,span=5478b343dbfe4ef0,sampled=1,service.name=llm-chatglm-service)
2024-03-05T22:25:51+0800 [ERROR] [api_server:llm-chatglm-service:16] Exception in ASGI application
Traceback (most recent call last):
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/bentoml/_internal/server/http/traffic.py", line 26, in __call__
await self.app(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/bentoml/_internal/server/http/instruments.py", line 135, in __call__
await self.app(scope, receive, wrapped_send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/opentelemetry/instrumentation/asgi/__init__.py", line 596, in __call__
await self.app(scope, otel_receive, otel_send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/bentoml/_internal/server/http/access.py", line 126, in __call__
await self.app(scope, receive, wrapped_send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 758, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 487, in handle
await self.app(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/applications.py", line 123, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/middleware/errors.py", line 186, in __call__
raise exc
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/middleware/errors.py", line 164, in __call__
await self.app(scope, receive, _send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/middleware/exceptions.py", line 62, in __call__
await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 758, in __call__
await self.middleware_stack(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 778, in app
await route.handle(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 299, in handle
await self.app(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 79, in app
await wrap_app_handling_exceptions(app, request)(scope, receive, send)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
raise exc
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
await app(scope, receive, sender)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/routing.py", line 74, in app
response = await func(request)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/entrypoints/openai.py", line 159, in chat_completions
prompt = llm.tokenizer.apply_chat_template(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/_llm.py", line 411, in tokenizer
self.__llm_tokenizer__ = openllm.serialisation.load_tokenizer(self, **self.llm_parameters[-1])
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/serialisation/__init__.py", line 33, in load_tokenizer
tokenizer = transformers.AutoTokenizer.from_pretrained(bentomodel_fs.getsyspath('/'), trust_remote_code=llm.trust_remote_code, **tokenizer_attrs)
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/models/auto/tokenization_auto.py", line 810, in from_pretrained
return tokenizer_class.from_pretrained(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
return cls._from_pretrained(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 196, in __init__
super().__init__(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 367, in __init__
self._add_tokens(
File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/transformers/tokenization_utils.py", line 467, in _add_tokens
current_vocab = self.get_vocab().copy()
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 248, in get_vocab
vocab = {self._convert_id_to_token(i): i for i in range(self.vocab_size)}
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 244, in vocab_size
return self.sp_tokenizer.num_tokens
AttributeError: 'ChatGLMTokenizer' object has no attribute 'sp_tokenizer'

Environment

Name: bentoml Version: 1.1.11 Summary: BentoML: Build Production-Grade AI Applications Home-page: None Author: None Author-email: BentoML Team contact@bentoml.com License: Apache-2.0 Location: /opt/buildtools/python-3.9.2/lib/python3.9/site-packages Requires: prometheus-client, python-json-logger, pathspec, fs, rich, numpy, packaging, opentelemetry-instrumentation, httpx, opentelemetry-util-http, click-option-group, psutil, opentelemetry-api, opentelemetry-instrumentation-aiohttp-client, opentelemetry-semantic-conventions, jinja2, schema, deepmerge, attrs, opentelemetry-sdk, python-dateutil, simple-di, cloudpickle, pip-requirements-parser, uvicorn, requests, aiohttp, watchfiles, circus, python-multipart, inflection, click, pip-tools, pyyaml, starlette, cattrs, nvidia-ml-py, opentelemetry-instrumentation-asgi Required-by: openllm

Name: transformers Version: 4.38.2 Summary: State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow Home-page: https://github.com/huggingface/transformers Author: The Hugging Face team (past and future) with the help of all our contributors (https://github.com/huggingface/transformers/graphs/contributors) Author-email: transformers@huggingface.co License: Apache 2.0 License Location: /opt/buildtools/python-3.9.2/lib/python3.9/site-packages Requires: requests, numpy, regex, huggingface-hub, filelock, safetensors, packaging, tokenizers, pyyaml, tqdm Required-by: optimum, openllm

Python 3.9.2

System information (Optional)

No response

bentoml / OpenLLM