Closed seanbenhur closed 11 months ago
You need to set DTYPE=float32
for this.
Maybe I can set the type automatically when there is no CPU.
Currently the behaviour is always default to bfloat16, which requires GPU.
Closed because this is not a bug. I will update the README about this.
I got same error here, start chatglm using local model, and set DTYPE=float32
doesn't work for me.
| Traceback (most recent call last):
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/responses.py", line 261, in wrap
| await func()
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/starlette/responses.py", line 250, in stream_response
| async for chunk in self.body_iterator:
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/bentoml/_internal/server/runner_app.py", line 373, in stream_encoder
| async for p in payload:
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/bentoml/_internal/server/runner_app.py", line 214, in inner
| async for data in ret:
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/openllm/_runners.py", line 222, in generate_iterator
| out = self.model(input_ids=start_ids, use_cache=True)
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
| return forward_call(*args, **kwargs)
| File "/root/.cache/huggingface/modules/transformers_modules/7d451dccbae3196be9b8efcdffe6a47c8c028687/modeling_chatglm.py", line 1190, in forward
| transformer_outputs = self.transformer(
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
| return forward_call(*args, **kwargs)
| File "/root/.cache/huggingface/modules/transformers_modules/7d451dccbae3196be9b8efcdffe6a47c8c028687/modeling_chatglm.py", line 996, in forward
| layer_ret = layer(
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
| return forward_call(*args, **kwargs)
| File "/root/.cache/huggingface/modules/transformers_modules/7d451dccbae3196be9b8efcdffe6a47c8c028687/modeling_chatglm.py", line 624, in forward
| attention_input = self.input_layernorm(hidden_states)
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
| return self._call_impl(*args, **kwargs)
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
| return forward_call(*args, **kwargs)
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/torch/nn/modules/normalization.py", line 201, in forward
| return F.layer_norm(
| File "/opt/buildtools/python-3.9.2/lib/python3.9/site-packages/torch/nn/functional.py", line 2546, in layer_norm
| return torch.layer_norm(input, normalized_shape, weight, bias, eps, torch.backends.cudnn.enabled)
Traceback (most recent call last):
| RuntimeError: "LayerNormKernelImpl" not implemented for 'Half'
cmd: DTYPE=float32 TRUST_REMOTE_CODE=True openllm start /usr1/models/chatglm-6b --backend pt
Describe the bug
I am hosting a flant5 model on CPU and I am getting the above errpr
To reproduce
openllm start google/flan-t5-small --port 3000 --do-not-track --api_workers 17
Logs
Environment
openllm==0.4.40
System information (Optional)
AWS Instance t2.large