Closed lovivi closed 2 months ago
run chatglm3-6b model with error log as follow
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/starlette/routing.py", line 705, in lifespan
async with self.lifespan_context(app) as maybe_state:
File "/root/miniconda3/envs/openllm/lib/python3.10/contextlib.py", line 199, in __aenter__
return await anext(self.gen)
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/bentoml/_internal/server/base_app.py", line 75, in lifespan
on_startup()
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 317, in init_local
raise e
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 307, in init_local
self._set_handle(LocalRunnerRef)
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/bentoml/_internal/runner/runner.py", line 150, in _set_handle
runner_handle = handle_class(self, *args, **kwargs)
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/bentoml/_internal/runner/runner_handle/local.py", line 27, in __init__
self._runnable = runner.runnable_class(**runner.runnable_init_params) # type: ignore
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/_runners.py", line 172, in __init__
self.llm, self.config, self.model, self.tokenizer = llm, llm.config, llm.model, llm.tokenizer
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/_llm.py", line 326, in tokenizer
if self.__llm_tokenizer__ is None: self.__llm_tokenizer__ = openllm.serialisation.load_tokenizer(self, **self.llm_parameters[-1])
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/openllm/serialisation/__init__.py", line 44, in load_tokenizer
tokenizer = AutoTokenizer.from_pretrained(
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 755, in from_pretrained
return tokenizer_class.from_pretrained(pretrained_model_name_or_path, *inputs, **kwargs)
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2024, in from_pretrained
return cls._from_pretrained(
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2256, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
File "/root/.cache/huggingface/modules/transformers_modules/tokenization_chatglm.py", line 108, in __init__
super().__init__(padding_side=padding_side, clean_up_tokenization_spaces=clean_up_tokenization_spaces,
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 363, in __init__
super().__init__(**kwargs)
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 1604, in __init__
super().__init__(**kwargs)
File "/root/miniconda3/envs/openllm/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 861, in __init__
setattr(self, key, value)
AttributeError: can't set attribute 'eos_token'
2023-11-29T18:25:36+0800 [ERROR] [runner:llm-chatglm-runner:1] Application startup failed. Exiting.
I have also encountered the same problem.
env:
Fri Jan 5 16:28:57 2024
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.223.02 Driver Version: 470.223.02 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A100-PCI... Off | 00000000:61:00.0 Off | 0 |
| N/A 31C P0 36W / 250W | 12780MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
| 1 NVIDIA A100-PCI... Off | 00000000:DB:00.0 Off | 0 |
| N/A 35C P0 31W / 250W | 3MiB / 40536MiB | 0% Default |
| | | Disabled |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 142549 C ...nda3/envs/chat/bin/python 12777MiB |
+-----------------------------------------------------------------------------+
python: 3.10 openllm: 0.4.41
my start cmd is :
cmd="env CUDA_VISIBLE_DEVICES=0,1 OPENBLAS_NUM_THREADS=1 TRUST_REMOTE_CODE=True openllm start /data/models/chatglm3-6b --backend pt -p 3333"
nohup $cmd > chatglm.log 2>&1 &
pip list:
pip list
Package Version
-------------------------------------------- ---------------
accelerate 0.25.0
aiohttp 3.9.1
aioprometheus 23.12.0
aiosignal 1.3.1
annotated-types 0.6.0
anyio 4.2.0
appdirs 1.4.4
asgiref 3.7.2
asttokens 2.4.1
async-timeout 4.0.3
attrs 23.2.0
beautifulsoup4 4.12.2
bentoml 1.1.11
bitsandbytes 0.41.3.post2
build 0.10.0
cattrs 23.1.2
certifi 2023.11.17
charset-normalizer 3.3.2
circus 0.18.0
click 8.1.7
click-option-group 0.5.6
cloudpickle 3.0.0
cmake 3.28.1
coloredlogs 15.0.1
contextlib2 21.6.0
cpm-kernels 1.0.11
cuda-python 12.3.0
dashscope 1.13.6
dataclasses-json 0.6.3
datasets 2.16.1
decorator 5.1.1
deepmerge 1.1.1
Deprecated 1.2.14
dill 0.3.7
distlib 0.3.8
distro 1.9.0
docarray 0.40.0
einops 0.7.0
exceptiongroup 1.2.0
executing 2.0.1
fastapi 0.108.0
fastcore 1.5.29
filelock 3.13.1
filetype 1.2.0
frozenlist 1.4.1
fs 2.4.16
fsspec 2023.10.0
ghapi 1.0.4
greenlet 3.0.3
grpcio 1.60.0
h11 0.14.0
html2text 2020.1.16
httpcore 1.0.2
httptools 0.6.1
httpx 0.26.0
huggingface-hub 0.20.1
humanfriendly 10.0
idna 3.6
importlib-metadata 6.11.0
inflection 0.5.1
ipython 8.19.0
jedi 0.19.1
Jinja2 3.1.2
joblib 1.3.2
jsonpatch 1.33
jsonpointer 2.4
jsonschema 4.20.0
jsonschema-specifications 2023.12.1
langchain 0.0.354
langchain-community 0.0.8
langchain-core 0.1.6
langsmith 0.0.77
lit 17.0.6
llama-hub 0.0.66
llama-index 0.9.25
loguru 0.7.2
markdown-it-py 3.0.0
MarkupSafe 2.1.3
marshmallow 3.20.1
matplotlib-inline 0.1.6
mdurl 0.1.2
mpmath 1.3.0
msgpack 1.0.7
multidict 6.0.4
multiprocess 0.70.15
mypy-extensions 1.0.0
nest-asyncio 1.5.8
networkx 3.2.1
ninja 1.11.1.1
nltk 3.8.1
numpy 1.26.3
nvidia-cublas-cu12 12.1.3.1
nvidia-cuda-cupti-cu12 12.1.105
nvidia-cuda-nvrtc-cu12 12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12 8.9.2.26
nvidia-cufft-cu12 11.0.2.54
nvidia-curand-cu12 10.3.2.106
nvidia-cusolver-cu12 11.4.5.107
nvidia-cusparse-cu12 12.1.0.106
nvidia-ml-py 11.525.150
nvidia-nccl-cu12 2.18.1
nvidia-nvjitlink-cu12 12.3.101
nvidia-nvtx-cu12 12.1.105
openai 1.6.1
openllm 0.4.41
openllm-client 0.4.41
openllm-core 0.4.41
opentelemetry-api 1.20.0
opentelemetry-instrumentation 0.41b0
opentelemetry-instrumentation-aiohttp-client 0.41b0
opentelemetry-instrumentation-asgi 0.41b0
opentelemetry-sdk 1.20.0
opentelemetry-semantic-conventions 0.41b0
opentelemetry-util-http 0.41b0
optimum 1.16.1
orjson 3.9.10
packaging 23.2
pandas 2.1.4
parso 0.8.3
pathspec 0.12.1
pexpect 4.9.0
pillow 10.2.0
pip 23.3.2
pip-requirements-parser 32.0.1
pip-tools 7.3.0
platformdirs 4.1.0
prometheus-client 0.19.0
prompt-toolkit 3.0.43
protobuf 4.25.1
psutil 5.9.7
ptyprocess 0.7.0
pure-eval 0.2.2
pyaml 23.12.0
pyarrow 14.0.2
pyarrow-hotfix 0.6
pydantic 1.10.13
pydantic_core 2.14.6
Pygments 2.17.2
pyparsing 3.1.1
pyproject_hooks 1.0.0
python-dateutil 2.8.2
python-dotenv 1.0.0
python-json-logger 2.0.7
python-multipart 0.0.6
pytz 2023.3.post1
PyYAML 6.0.1
pyzmq 25.1.2
quantile-python 1.1
ray 2.6.0
referencing 0.32.0
regex 2023.12.25
requests 2.31.0
retrying 1.3.4
rich 13.7.0
rpds-py 0.16.2
safetensors 0.4.1
schema 0.7.5
scipy 1.11.4
sentencepiece 0.1.99
setuptools 69.0.3
simple-di 0.1.5
six 1.16.0
sniffio 1.3.0
soupsieve 2.5
SQLAlchemy 2.0.25
stack-data 0.6.3
starlette 0.32.0.post1
sympy 1.12
tenacity 8.2.3
tiktoken 0.5.2
tokenizers 0.15.0
tomli 2.0.1
torch 2.0.1+cu117
torchaudio 2.0.2+cu117
torchvision 0.15.2+cu117
tornado 6.4
tqdm 4.66.1
traitlets 5.14.1
transformers 4.36.2
transformers-stream-generator 0.0.4
triton 2.0.0
types-requests 2.31.0.20231231
typing_extensions 4.9.0
typing-inspect 0.9.0
tzdata 2023.4
urllib3 2.1.0
uvicorn 0.25.0
uvloop 0.19.0
virtualenv 20.25.0
vllm 0.2.6
watchfiles 0.21.0
wcwidth 0.2.12
websockets 12.0
wheel 0.42.0
wrapt 1.16.0
xformers 0.0.23.post1
xxhash 3.4.1
yarl 1.9.4
zipp 3.17.0
close for openllm 0.6
Describe the bug
When I execute TRUST on the P40 card REMOTE CODE=True openllm start/NAS/user/songjie/software/llm/chatglm3-6b Unable to load properly (can build normally, whether it is pt or vllm, but will report an error whenever requested)
The error is as follows:
To reproduce
card REMOTE CODE=True openllm start/NAS/user/songjie/software/llm/chatglm3-6b
Logs
Environment
Driver Version: 535.129.03
cuda-python 12.3.0 pypi_0 pypi cudatoolkit 11.8.0 h6a678d5_0 python 3.10.13 h955ad1f_0 openllm 0.4.31 pypi_0 pypi openllm-client 0.4.31 pypi_0 pypi openllm-core 0.4.31 pypi_0 pypi
vllm 0.2.2 torch 2.1.0
System information (Optional)
No response