OpenBMB / MiniCPM-V

MiniCPM-V 2.6: A GPT-4V Level MLLM for Single Image, Multi Image and Video on Your Phone
Apache License 2.0
12.48k stars 878 forks source link

VLLM 用AsyncLLMEngine推理结果报错 #475

Closed elfisworking closed 2 months ago

elfisworking commented 2 months ago

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

模型MiniCPM-Llama3-V2.5使用如下的脚本推理正常

from PIL import Image
from vllm import LLM, SamplingParams,EngineArgs,LLMEngine

MODEL_NAME = "/model"
# Also available for previous models
# MODEL_NAME = "openbmb/MiniCPM-Llama3-V-2_5"
# MODEL_NAME = "HwwwH/MiniCPM-V-2"

if __name__ == "__main__":
    image = Image.open("/app/fruit_stand.jpg").convert("RGB")
    tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
    llm = LLM(
    model=MODEL_NAME,
    trust_remote_code=True,
    gpu_memory_utilization=1,
    max_model_len=2048,
    tensor_parallel_size=2
    )
    messages = [{
        "role":
        "user",
        "content":
        # Number of images
        "(<image>./</image>)" + \
        "\n这是一张什么图片?"
    }]
    prompt = tokenizer.apply_chat_template(
        messages,
        tokenize=False,
        add_generation_prompt=True
    )

    # Single Inference
    inputs = {
        "prompt": prompt,
        "multi_modal_data": {
            "image": image
            # Multi images, the number of images should be equal to that of `(<image>./</image>)`
            # "image": [image, image]
        },
    }
    # Batch Inference
    # inputs = [{
    #     "prompt": prompt,
    #     "multi_modal_data": {
    #         "image": image
    #     },
    # } for _ in 2]

    # 2.6
    #stop_tokens = ['<|im_end|>', '<|endoftext|>']
    #stop_token_ids = [tokenizer.convert_tokens_to_ids(i) for i in stop_tokens]
    # 2.0
    # stop_token_ids = [tokenizer.eos_id]
    # 2.5
    stop_token_ids = [tokenizer.eos_id, tokenizer.eot_id]

    sampling_params = SamplingParams(
        stop_token_ids=stop_token_ids,
        use_beam_search=True,
        temperature=0,
        best_of=3,
        max_tokens=1024
    )

    outputs = llm.generate(inputs, sampling_params=sampling_params)

    print(outputs[0].outputs[0].text)```
但是当不使用LLM使用AsyncLLMEngine时,推理结果不正确

### 期望行为 | Expected Behavior

_No response_

### 复现方法 | Steps To Reproduce

_No response_

### 运行环境 | Environment

```Markdown
- OS:centos
- Python:3.10
- Transformers:4.44.0
- PyTorch:2.4.0
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):
- vllm:0.5.4

备注 | Anything else?

No response

BCWang93 commented 2 months ago

报什么错呢具体?

elfisworking commented 2 months ago

感觉是之前的prompt拼接的问题,报模型不支持识别图片。目前已经解决了

BCWang93 commented 2 months ago

感觉是之前的prompt拼接的问题,报模型不支持识别图片。目前已经解决了

大佬你使用asyncllmengine的话,传进去的参数也是inputs,是一个字典?但是我看asyncllmengine对应的generate那个方法中的input要求是一个prompt

elfisworking commented 2 months ago

感觉是之前的prompt拼接的问题,报模型不支持识别图片。目前已经解决了

大佬你使用asyncllmengine的话,传进去的参数也是inputs,是一个字典?但是我看asyncllmengine对应的generate那个方法中的input要求是一个prompt

这个问题不大,我测试过也扒过vllm的代码。谢谢啊