maxoyed commented 7 months ago

复现过程

按照官方公众号发的教程，使用 FastChat 部署，模型是从 ModelScope 上通过 git 下载的：Yuan2.0-2B-Janus-hf

官方微信公众号文章链接：源2.0适配FastChat框架！企业快速本地化部署大模型对话平台

操作步骤

cd /opt/project/fast-chat # 进入工作目录
python -m venv .venv # 创建虚拟环境
source .venv/bin/activate # 激活虚拟环境
pip install "fschat[model_worker,webui]" # 安装 FastChat
git clone https://www.modelscope.cn/YuanLLM/Yuan2-2B-Janus-hf.git # 下载模型文件
python -m fastchat.serve.cli --model-path /opt/project/fast-chat/Yuan2-2B-Janus-hf # 启动 FastChat Cli 服务

错误信息

$ python -m fastchat.serve.cli --model-path /opt/project/fast-chat/Yuan2-2B-Janus-hf
2024-03-01 23:28:08,304 - modelscope - INFO - PyTorch version 2.2.1 Found.
2024-03-01 23:28:08,305 - modelscope - INFO - Loading ast index from /home/maxoyed/.cache/modelscope/ast_indexer
2024-03-01 23:28:08,305 - modelscope - INFO - No valid ast index found from /home/maxoyed/.cache/modelscope/ast_indexer, generating ast index from prebuilt!
2024-03-01 23:28:08,339 - modelscope - INFO - Loading done! Current index file version is 1.12.0, with md5 30f9d6887bb264aa0df846abe2df639b and a total number of 964 components indexed
You are using the default legacy behaviour of the <class 'transformers.models.llama.tokenization_llama.LlamaTokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thoroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
user: hello
assistant: /opt/project/fast-chat/.venv/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:410: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/opt/project/fast-chat/.venv/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:415: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/opt/project/fast-chat/.venv/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:427: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `1` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
  warnings.warn(
/opt/project/fast-chat/.venv/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:410: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.7` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
  warnings.warn(
/opt/project/fast-chat/.venv/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:415: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.0` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
  warnings.warn(
/opt/project/fast-chat/.venv/lib/python3.9/site-packages/transformers/generation/configuration_utils.py:427: UserWarning: `do_sample` is set to `False`. However, `top_k` is set to `1` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_k`.
  warnings.warn(
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/home/maxoyed/.pyenv/versions/3.9.18/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/home/maxoyed/.pyenv/versions/3.9.18/lib/python3.9/threading.py", line 917, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/transformers/generation/utils.py", line 1544, in generate
    return self.greedy_search(
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/transformers/generation/utils.py", line 2404, in greedy_search
    outputs = self(
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/maxoyed/.cache/huggingface/modules/transformers_modules/Yuan2-2B-Janus-hf/yuan_hf_model.py", line 936, in forward
    outputs = self.model(
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/maxoyed/.cache/huggingface/modules/transformers_modules/Yuan2-2B-Janus-hf/yuan_hf_model.py", line 766, in forward
    layer_outputs = decoder_layer(
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/maxoyed/.cache/huggingface/modules/transformers_modules/Yuan2-2B-Janus-hf/yuan_hf_model.py", line 427, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/maxoyed/.cache/huggingface/modules/transformers_modules/Yuan2-2B-Janus-hf/yuan_hf_model.py", line 310, in forward
    cos, sin = self.rotary_emb(value_states, seq_len=kv_seq_len)
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/opt/project/fast-chat/.venv/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
TypeError: forward() missing 1 required positional argument: 'position_ids'

版本信息

Python: 3.9.18 (venv)
fschat: 0.2.36
torch: 2.2.1
transformers: 4.38.2
GPU: NVIDIA GeForce RTX 4060 Ti 8G
GPU Driver: 545.29.06
CUDA: 12.3
OS: Ubuntu 22.04 jammy
Kernel: x86_64 Linux 5.15.0-97-generic

cauwulixuan commented 7 months ago

这个问题是由于transformers库在4.38版本之后，更新了LlamaRotaryEmbedding.forward()方法，所以目前有两种解决方案：

transformers版本降到4.38以下
我们更新模型，适配更高版本的transformers库

您可以先尝试方案1绕过这个问题，对于方案2我们需要一些时间做一下评估，考虑版本兼容的问题，感谢您提出的问题。

Shawn-IEITSystems commented 7 months ago

@maxoyed 请问问题是否解决了？

maxoyed commented 7 months ago

@maxoyed 请问问题是否解决了？

transformers 降级到 4.37.2 解决了

IEIT-Yuan / Yuan-2.0

FastChat 部署报错：TypeError: forward() missing 1 required positional argument: 'position_ids' #123

复现过程

错误信息

版本信息