NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7
torch 2.1.0
anaconda env
Python 3.10.13
[Followed the readme from a brand new env]
Reproduce
Follow the readme
streamlit ...... react_web_demo.py
Web demo works fine. GPT-3.5 API works fine.
Load InternLM fine.
But when chat with InternLM, boom, print the following. (I'm using a local hf model path. Still testing just use the remote model path internlm/internlm-chat-7b-v1_1 Same issue in local model path and remote model string.)
You can now view your Streamlit app in your browser.
/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
return torch._C._cuda_getDeviceCount() > 0
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:09<00:00, 1.22s/it]
2023-10-20 09:54:30.730 Uncaught app exception
Traceback (most recent call last):
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 541, in _run_script
exec(code, module.__dict__)
File "/data2/.data/xxx/LLM/lagent/examples/react_web_demo.py", line 217, in <module>
main()
File "/data2/.data/xxx/LLM/lagent/examples/react_web_demo.py", line 207, in main
agent_return = st.session_state['chatbot'].chat(user_input)
File "/data2/.data/xxx/LLM/lagent/lagent/agents/react.py", line 224, in chat
response = self._llm.generate_from_template(prompt, 512)
File "/data2/.data/xxx/LLM/lagent/lagent/llms/huggingface.py", line 125, in generate_from_template
response = self.generate(inputs, max_out_len=max_out_len, **kwargs)
File "/data2/.data/xxx/LLM/lagent/lagent/llms/huggingface.py", line 102, in generate
outputs = self.model.generate(
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/transformers/generation/utils.py", line 1606, in generate
return self.greedy_search(
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/transformers/generation/utils.py", line 2454, in greedy_search
outputs = self(
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data2/.data/xxx/.cache/huggingface/modules/transformers_modules/internlm-chat-7b-v1_1/modeling_internlm.py", line 692, in forward
outputs = self.model(
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data2/.data/xxx/.cache/huggingface/modules/transformers_modules/internlm-chat-7b-v1_1/modeling_internlm.py", line 580, in forward
layer_outputs = decoder_layer(
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data2/.data/xxx/.cache/huggingface/modules/transformers_modules/internlm-chat-7b-v1_1/modeling_internlm.py", line 294, in forward
hidden_states, self_attn_weights, present_key_value = self.self_attn(
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data2/.data/xxx/.cache/huggingface/modules/transformers_modules/internlm-chat-7b-v1_1/modeling_internlm.py", line 198, in forward
query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
return F.linear(input, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
It seems the calculation is on CPU and you got a warning telling the NVIDIA driver is too old. You might need to make the pytorch compatible with your CUDA driver first.
ENV
Reproduce
streamlit ...... react_web_demo.py
Still testing just use the remote model pathSame issue in local model path and remote model string.)internlm/internlm-chat-7b-v1_1
Network URL: http://...:8501 External URL: http://...:8501