[Bug] xcomposer 4khd lora weight error in lmdeploy

ztfmars commented 3 weeks ago

Checklist

[X] 1. I have searched related issues but cannot get the expected help.
[X] 2. The bug has not been fixed in the latest version.

Describe the bug

i have fineturne xcomposer with fineture_lora.sh and merget the adapter with pretrain "internlm-xcomposer2-4khd". i have tested on the xcomposer-4khd gradio demo - link. the merged weight can work well.

but when i use the same weight to test on the lmdeploy gradio demo , it occured an error the error:

i have tried the pretrained weight from Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b, it can be used well in lmdeploy gradio demo. i am really confused how to get my new lora weights work.

my code can be list :

########### gradio_demo_lmdeploy.py
xcomposer_4khd_model = "/home/fusionai/project/internllm_demo/xcomposer_test/merge_tools/4khd_3e_logic_50repeated_15seman_2format"
# pre_xcomposer_4khd_model ='/home/fusionai/.cache/modelscope/hub/Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b'
########### pipeline
pipe = pipeline(xcomposer_4khd_model,
                chat_template_config=ChatTemplateConfig(model_name='internlm-xcomposer2-4khd'))

def model(image, text):
    if image is None:
        return [(text, "请上传一张图片。")]
    else:
        response = pipe((text, image)).text
        return [(text, response)]

demo = gr.Interface(fn=model, inputs=[gr.Image(type="pil"), gr.Textbox()], outputs=gr.Chatbot())
demo.launch(server_name='0.0.0.0', server_port= 6006, show_error=True)

my merge code for merging xcomposer llm and lora adapter:

######### merge_peft_adapter.py
from dataclasses import dataclass, field
from typing import Optional

import torch
from peft import PeftConfig, PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer, HfArgumentParser

@dataclass
class ScriptArguments:
    """
    The input names representing the Adapter and Base model fine-tuned with PEFT, and the output name representing the
    merged model.
    """

    adapter_model_name: Optional[str] = field(default=None, metadata={"help": "the adapter name"})
    base_model_name: Optional[str] = field(default=None, metadata={"help": "the base model name"})
    output_name: Optional[str] = field(default=None, metadata={"help": "the merged model name"})

parser = HfArgumentParser(ScriptArguments)
script_args = parser.parse_args_into_dataclasses()[0]
assert script_args.adapter_model_name is not None, "please provide the name of the Adapter you would like to merge"
assert script_args.base_model_name is not None, "please provide the name of the Base model"
assert script_args.output_name is not None, "please provide the output name of the merged model"

peft_config = PeftConfig.from_pretrained(script_args.adapter_model_name)
model = AutoModelForCausalLM.from_pretrained(
    script_args.base_model_name, return_dict=True, torch_dtype=torch.bfloat16, trust_remote_code=True
)

tokenizer = AutoTokenizer.from_pretrained(script_args.base_model_name, trust_remote_code=True)

# Load the PEFT model
model = PeftModel.from_pretrained(model, script_args.adapter_model_name)
model.eval()

model = model.merge_and_unload()

model.save_pretrained(f"{script_args.output_name}")
tokenizer.save_pretrained(f"{script_args.output_name}")

python3 merge_peft_adapter.py \ --adapter_model_name=/home/fusionai/project/internllm_demo/xcomposer_test/train/4khd_3e_mixed_all \ --base_model_name=/home/fusionai/.cache/modelscope/hub/Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b \ --output_name=4khd_3e_mixed_all

look forward to you reply!

Reproduction

python3 merge_peft_adapter.py \ --adapter_model_name=/home/fusionai/project/internllm_demo/xcomposer_test/train/4khd_3e_mixed_all \ --base_model_name=/home/fusionai/.cache/modelscope/hub/Shanghai_AI_Laboratory/internlm-xcomposer2-4khd-7b \ --output_name=4khd_3e_mixed_all

python gradio_demo_lmdeploy.py

Environment

sys.platform: linux
Python: 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]
CUDA available: True
MUSA available: False
numpy_random_seed: 2147483648
GPU 0,1,2,3,4,5: NVIDIA A800 80GB PCIe
CUDA_HOME: /usr/local/cuda-11.7
NVCC: Cuda compilation tools, release 11.7, V11.7.99
GCC: gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
PyTorch: 2.1.2+cu121
PyTorch compiling details: PyTorch built with:
  - GCC 9.3
  - C++ Version: 201703
  - Intel(R) oneAPI Math Kernel Library Version 2022.2-Product Build 20220804 for Intel(R) 64 architecture applications
  - Intel(R) MKL-DNN v3.1.1 (Git Hash 64f6bcbcbab628e96f33a62c3e975f8535a7bde4)
  - OpenMP 201511 (a.k.a. OpenMP 4.5)
  - LAPACK is enabled (usually provided by MKL)
  - NNPACK is enabled
  - CPU capability usage: AVX2
  - CUDA Runtime 12.1
  - NVCC architecture flags: -gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_70,code=sm_70;-gencode;arch=compute_75,code=sm_75;-gencode;arch=compute_80,code=sm_80;-gencode;arch=compute_86,code=sm_86;-gencode;arch=compute_90,code=sm_90
  - CuDNN 8.9.2
  - Magma 2.6.1
  - Build settings: BLAS_INFO=mkl, BUILD_TYPE=Release, CUDA_VERSION=12.1, CUDNN_VERSION=8.9.2, CXX_COMPILER=/opt/rh/devtoolset-9/root/usr/bin/c++, CXX_FLAGS= -D_GLIBCXX_USE_CXX11_ABI=0 -fabi-version=11 -fvisibility-inlines-hidden -DUSE_PTHREADPOOL -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOROCTRACER -DUSE_FBGEMM -DUSE_QNNPACK -DUSE_PYTORCH_QNNPACK -DUSE_XNNPACK -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Werror=return-type -Werror=non-virtual-dtor -Werror=bool-operation -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-invalid-partial-specialization -Wno-unused-private-field -Wno-aligned-allocation-unavailable -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Werror=format -Werror=cast-function-type -Wno-stringop-overflow, LAPACK_INFO=mkl, PERF_WITH_AVX=1, PERF_WITH_AVX2=1, PERF_WITH_AVX512=1, TORCH_DISABLE_GPU_ASSERTS=ON, TORCH_VERSION=2.1.2, USE_CUDA=ON, USE_CUDNN=ON, USE_EXCEPTION_PTR=1, USE_GFLAGS=OFF, USE_GLOG=OFF, USE_MKL=ON, USE_MKLDNN=ON, USE_MPI=OFF, USE_NCCL=1, USE_NNPACK=ON, USE_OPENMP=ON, USE_ROCM=OFF, 

TorchVision: 0.16.2+cu121
LMDeploy: 0.4.2+
transformers: 4.37.2
gradio: 4.16.0
fastapi: 0.111.0
pydantic: 2.7.1
triton: 2.1.0

Error traceback

(llama3_delpoy) fusionai@train68:~/project/internllm_demo/llama3/llama3-ft$ python gradio_llava.py 
Set max length to 16384
Dummy Resized
[WARNING] gemm_config.in is not found; using default GEMM algo                                                                           
Running on local URL:  http://0.0.0.0:6006

To create a public link, set `share=True` in `launch()`.
Exception in thread Thread-2 (_work_thread):
Traceback (most recent call last):
  File "/home/fusionai/anaconda3/envs/llama3_delpoy/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/home/fusionai/anaconda3/envs/llama3_delpoy/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/fusionai/anaconda3/envs/llama3_delpoy/lib/python3.10/site-packages/lmdeploy/vl/engine.py", line 82, in _work_thread
    self.loop.run_until_complete(self._forward_loop())
  File "/home/fusionai/anaconda3/envs/llama3_delpoy/lib/python3.10/asyncio/base_events.py", line 649, in run_until_complete
    return future.result()
  File "/home/fusionai/anaconda3/envs/llama3_delpoy/lib/python3.10/site-packages/lmdeploy/vl/engine.py", line 98, in _forward_loop
    outputs = self.forward(inputs)
  File "/home/fusionai/anaconda3/envs/llama3_delpoy/lib/python3.10/site-packages/lmdeploy/vl/engine.py", line 106, in forward
    outputs = self.model.forward(inputs)
  File "/home/fusionai/anaconda3/envs/llama3_delpoy/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "/home/fusionai/anaconda3/envs/llama3_delpoy/lib/python3.10/site-packages/lmdeploy/vl/model/xcomposer2.py", line 135, in forward
    return self._forward_func(images)
  File "/home/fusionai/anaconda3/envs/llama3_delpoy/lib/python3.10/site-packages/lmdeploy/vl/model/xcomposer2.py", line 111, in _forward_7b
    outputs = self.model.vit(outputs)
  File "/home/fusionai/anaconda3/envs/llama3_delpoy/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/fusionai/anaconda3/envs/llama3_delpoy/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
TypeError: CLIPVisionTower.forward() missing 2 required positional arguments: 'glb_GN' and 'sub_GN'

lvhan028 commented 3 weeks ago

Hi, @ztfmars Would you mind sharing your lora model with us? We are celebrating the Dragon Boat Festival, and will come back to you next Tuesday.

irexyc commented 3 weeks ago

视觉部分的预处理是否跟xcomposer 4khd一致呢？如果不一样的话就不能直接用LMDeploy来处理了。

我们会复用VLM模型视觉部分的预处理，对于 xcomposer-4khd，他的视觉部分forward是需要 glb_GN, sub_GN 这两个权重的，你可以看下合并后的权重index.json文件是否跟原xcomposer 4khd一致。

https://huggingface.co/internlm/internlm-xcomposer2-4khd-7b/blob/main/build_mlp.py#L76 https://huggingface.co/internlm/internlm-xcomposer2-4khd-7b/blob/main/pytorch_model.bin.index.json#L553-L554

ztfmars commented 3 weeks ago

fineturne xcomposer with fineture_lora.sh and merget the adapter with pretrain "internlm-xcomposer2-4khd"

权重已上传： merged lora: https://openxlab.org.cn/models/detail/ztfmars/xcomposer_lora_3e full train: https://openxlab.org.cn/models/detail/ztfmars/nuclear_blueprint_assistant

4khd gradio demo

看config好像没啥太大区别，都有 glb_GN, sub_GN 麻烦帮忙看一下，这个训练版本希望能支持部署，或者给出一些修改意见，多谢 @lvhan028 @irexyc

irexyc commented 3 weeks ago

我估计可能是这个地方的原因： https://github.com/InternLM/lmdeploy/blob/v0.4.2/lmdeploy/vl/model/xcomposer2.py#L63-L71

huggingface上面xcomposer 4khd 里面的architectures是 InternLM2ForCausalLM，LMDeploy 0.4.2是按照这个来判断到底是用_forward_7b还是_forward_4khd_7b。从你的log里面看用的是_forward_7b。

你可以把config.json里面的architecture改成InternLM2ForCausalLM 试试（估计你的可能是 InternLMXComposer2ForCausalLM)

lzcchl commented 2 weeks ago

谢谢，修改这里可以的，不再会有missing 2 required positional arguments: 'glb_GN' and 'sub_GN'。

想问一下另一个问题，为啥我这么通过代码生成之后，config.json显示"attn_implementation": "eager"，而不是"attn_implementation": "flash_attention_2"，

我是在openmmlab/lmdeploy:v0.4.2镜像的基础上再安装了flash_attn-2.5.6+cu118torch2.1cxx11abiFALSE-cp38-cp38-linux_x86_64.whl，进容器之后import flash_attn也没有报错，请问怎么修改一下呢？

irexyc commented 2 weeks ago

为啥我这么通过代码生成之后，config.json显示"attn_implementation": "eager"，

你是说save_pretrained生成的config.json么？这个好像跟我们没什么关系。

BTW，用LMDeploy推理xcomposer2的话，LLM部分用的是TurboMind引擎，不会去管attn_implementation的值是什么。

lzcchl commented 2 weeks ago

那继续问一下，因为我看lmdeploy部署服务时可以选择TurboMind或者Pytorch其中一个引擎，就是参数--backend {pytorch,turbomind}；

假设我用pytorch作为后端，那么这里attn_implementation会不会有影响呢？

irexyc commented 2 weeks ago

不是所有的模型都支持两个后端。对于VLM模型，大部分只有TurboMind后端支持。pytorch目前支持cogvlm，llava。

具体的可以看下下面两个文件中的 SUPPORTED_ARCHS。pytorch引擎也不会管attn_implementation是什么值

https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/pytorch/supported_models.py https://github.com/InternLM/lmdeploy/blob/main/lmdeploy/turbomind/supported_models.py

lzcchl commented 2 weeks ago

好的，谢谢~

github-actions[bot] commented 1 week ago

This issue is marked as stale because it has been marked as invalid or awaiting response for 7 days without any further response. It will be closed in 5 days if the stale label is not removed or if there is no further response.

github-actions[bot] commented 3 days ago

This issue is closed because it has been stale for 5 days. Please open a new issue if you have similar issues or you have any new updates now.

InternLM / lmdeploy