RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

ENV

NVIDIA-SMI 515.76 Driver Version: 515.76 CUDA Version: 11.7
torch 2.1.0
anaconda env
Python 3.10.13
[Followed the readme from a brand new env]

Reproduce

Follow the readme
streamlit ...... react_web_demo.py
Web demo works fine. GPT-3.5 API works fine.
Load InternLM fine.
But when chat with InternLM, boom, print the following. (I'm using a local hf model path. ~~Still testing just use the remote model path internlm/internlm-chat-7b-v1_1~~ Same issue in local model path and remote model string.)

You can now view your Streamlit app in your browser.

Network URL: http://...:8501 External URL: http://...:8501

/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:09<00:00,  1.22s/it]
2023-10-20 09:54:30.730 Uncaught app exception
Traceback (most recent call last):
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 541, in _run_script
    exec(code, module.__dict__)
  File "/data2/.data/xxx/LLM/lagent/examples/react_web_demo.py", line 217, in <module>
    main()
  File "/data2/.data/xxx/LLM/lagent/examples/react_web_demo.py", line 207, in main
    agent_return = st.session_state['chatbot'].chat(user_input)
  File "/data2/.data/xxx/LLM/lagent/lagent/agents/react.py", line 224, in chat
    response = self._llm.generate_from_template(prompt, 512)
  File "/data2/.data/xxx/LLM/lagent/lagent/llms/huggingface.py", line 125, in generate_from_template
    response = self.generate(inputs, max_out_len=max_out_len, **kwargs)
  File "/data2/.data/xxx/LLM/lagent/lagent/llms/huggingface.py", line 102, in generate
    outputs = self.model.generate(
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/transformers/generation/utils.py", line 1606, in generate
    return self.greedy_search(
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/transformers/generation/utils.py", line 2454, in greedy_search
    outputs = self(
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data2/.data/xxx/.cache/huggingface/modules/transformers_modules/internlm-chat-7b-v1_1/modeling_internlm.py", line 692, in forward
    outputs = self.model(
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data2/.data/xxx/.cache/huggingface/modules/transformers_modules/internlm-chat-7b-v1_1/modeling_internlm.py", line 580, in forward
    layer_outputs = decoder_layer(
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data2/.data/xxx/.cache/huggingface/modules/transformers_modules/internlm-chat-7b-v1_1/modeling_internlm.py", line 294, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data2/.data/xxx/.cache/huggingface/modules/transformers_modules/internlm-chat-7b-v1_1/modeling_internlm.py", line 198, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

InternLM / lagent

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' #51

ENV

Reproduce