InternLM / lagent

A lightweight framework for building LLM-based agents
Apache License 2.0
1.83k stars 193 forks source link

RuntimeError: "addmm_impl_cpu_" not implemented for 'Half' #51

Closed maxchiron closed 1 year ago

maxchiron commented 1 year ago

ENV

Reproduce

  1. Follow the readme
  2. streamlit ...... react_web_demo.py
  3. Web demo works fine. GPT-3.5 API works fine.
  4. Load InternLM fine.
  5. But when chat with InternLM, boom, print the following. (I'm using a local hf model path. Still testing just use the remote model path internlm/internlm-chat-7b-v1_1 Same issue in local model path and remote model string.)

You can now view your Streamlit app in your browser.

Network URL: http://...:8501 External URL: http://...:8501

/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/cuda/__init__.py:138: UserWarning: CUDA initialization: The NVIDIA driver on your system is too old (found version 11070). Please update your GPU driver by downloading and installing a new version from the URL: http://www.nvidia.com/Download/index.aspx Alternatively, go to: https://pytorch.org to install a PyTorch version that has been compiled with your version of the CUDA driver. (Triggered internally at ../c10/cuda/CUDAFunctions.cpp:108.)
  return torch._C._cuda_getDeviceCount() > 0
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:09<00:00,  1.22s/it]
2023-10-20 09:54:30.730 Uncaught app exception
Traceback (most recent call last):
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/streamlit/runtime/scriptrunner/script_runner.py", line 541, in _run_script
    exec(code, module.__dict__)
  File "/data2/.data/xxx/LLM/lagent/examples/react_web_demo.py", line 217, in <module>
    main()
  File "/data2/.data/xxx/LLM/lagent/examples/react_web_demo.py", line 207, in main
    agent_return = st.session_state['chatbot'].chat(user_input)
  File "/data2/.data/xxx/LLM/lagent/lagent/agents/react.py", line 224, in chat
    response = self._llm.generate_from_template(prompt, 512)
  File "/data2/.data/xxx/LLM/lagent/lagent/llms/huggingface.py", line 125, in generate_from_template
    response = self.generate(inputs, max_out_len=max_out_len, **kwargs)
  File "/data2/.data/xxx/LLM/lagent/lagent/llms/huggingface.py", line 102, in generate
    outputs = self.model.generate(
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/transformers/generation/utils.py", line 1606, in generate
    return self.greedy_search(
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/transformers/generation/utils.py", line 2454, in greedy_search
    outputs = self(
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data2/.data/xxx/.cache/huggingface/modules/transformers_modules/internlm-chat-7b-v1_1/modeling_internlm.py", line 692, in forward
    outputs = self.model(
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data2/.data/xxx/.cache/huggingface/modules/transformers_modules/internlm-chat-7b-v1_1/modeling_internlm.py", line 580, in forward
    layer_outputs = decoder_layer(
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data2/.data/xxx/.cache/huggingface/modules/transformers_modules/internlm-chat-7b-v1_1/modeling_internlm.py", line 294, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data2/.data/xxx/.cache/huggingface/modules/transformers_modules/internlm-chat-7b-v1_1/modeling_internlm.py", line 198, in forward
    query_states = self.q_proj(hidden_states).view(bsz, q_len, self.num_heads, self.head_dim).transpose(1, 2)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/data2/.data/xxx/.conda/envs/lagent/lib/python3.10/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
ZwwWayne commented 1 year ago

It seems the calculation is on CPU and you got a warning telling the NVIDIA driver is too old. You might need to make the pytorch compatible with your CUDA driver first.