InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.
https://lmdeploy.readthedocs.io/en/latest/
Apache License 2.0
4.59k stars 420 forks source link

[Feature] Implement COG-VLM2 #1622

Closed isidentical closed 4 months ago

isidentical commented 5 months ago

Motivation

CogVLM2 is now the SOTA open source VLM for captioning tasks.

Related resources

No response

Additional context

No response

RunningLeon commented 5 months ago

@isidentical hi, thanks for your information. We will include cogvlm2 after pr #1502 is merged.

Jayantverma2 commented 5 months ago

any update?

RunningLeon commented 5 months ago

any update?

hi, it's in progress. Any update will sync to this issue.

RunningLeon commented 5 months ago

@isidentical @Jayantverma2 hi, guys. CogVLM2 models are supported in PR #1502. If you have time, have a try. Welcome to leave any comments in the PR. THX.

Tushar-ml commented 5 months ago

@RunningLeon Is this the correct way to initialize the cogvlm2?

engine = pipeline(model_path, "cogvlm2",log_level="DEBUG") I have made some changes to config.json

{ "architectures": [ "CogVLMForCausalLM" ], "auto_map": { "AutoConfig": "configuration_cogvlm.CogVLMConfig", "AutoModelForCausalLM": "modeling_cogvlm.CogVLMForCausalLM" }, "vision_config": { "dropout_prob": 0.0, "hidden_act": "gelu", "in_channels": 3, "num_hidden_layers": 63, "hidden_size": 1792, "patch_size": 14, "num_heads": 16, "intermediate_size": 15360, "layer_norm_eps": 1e-06, "num_positions": 9217, "image_size": 1344 }, "hidden_size": 4096, "intermediate_size": 14336, "num_attention_heads": 32, "max_position_embeddings": 8192, "rms_norm_eps": 1e-05, "template_version": "chat", "initializer_range": 0.02, "bos_token_id": 128000, "eos_token_id": [128001, 128009], "pad_token_id": 128002, "vocab_size": 128256, "num_hidden_layers": 32, "hidden_act": "silu", "use_cache": true, "transformers_version": "4.41.0" }

But when I am running this with this prompt prompts = [ { 'role': 'user', 'content': [ {'type': 'text', 'text': prompt}, {'type': 'image_url', 'image_url': {'url': f'data:image/jpeg;base64,{image}'}} ] } ] it is generating b''

RunningLeon commented 5 months ago

@Tushar-ml hi, pls. follow examples in the document: https://lmdeploy.readthedocs.io/en/latest/inference/vl_pipeline.html#vlm-offline-inference-pipeline.

prompts should be like

prompts = [
    {
        'role': 'user',
        'content': [
            {'type': 'text', 'text': 'describe this image'},
            {'type': 'image_url', 'image_url': {'url': 'https://raw.githubusercontent.com/open-mmlab/mmdeploy/main/tests/data/tiger.jpeg'}}
        ]
    }
]
Tushar-ml commented 5 months ago

@RunningLeon any docs how to run this CogVLM2 as in PR mentioned, Tokenizer need to be applied manually

pseudotensor commented 5 months ago

awesome, look forward to it. Really like lmdeploy because it's much more stable than sglang for these vision models.

RunningLeon commented 5 months ago

@RunningLeon any docs how to run this CogVLM2 as in PR mentioned, Tokenizer need to be applied manually

@Tushar-ml hi, no need to do so for cogvlm2, but should do for cogvlm(1).

RunningLeon commented 5 months ago

awesome, look forward to it. Really like lmdeploy because it's much more stable than sglang for these vision models.

@pseudotensor hi, glad to hear that. If possible, please recommend lmdeploy to other people who are interested in deploying LLMs and VLMs. Thanks.

pseudotensor commented 5 months ago

awesome, look forward to it. Really like lmdeploy because it's much more stable than sglang for these vision models.

@pseudotensor hi, glad to hear that. If possible, please recommend lmdeploy to other people who are interested in deploying LLMs and VLMs. Thanks.

Yes, will gladly do that.

Tushar-ml commented 5 months ago

@RunningLeon I am getting OOM in A40G, 48 GRAM. What is the recommended system for cogvlm2, as model is of size not more than 40gb

RunningLeon commented 5 months ago

@RunningLeon I am getting OOM in A40G, 48 GRAM. What is the recommended system for cogvlm2, as model is of size not more than 40gb

@Tushar-ml hi, could you provide your sample code? Normally, you can reudce cache_max_entry_count to reduce kv mem size and reduce max_prefill_token_num from PytorchEngineConfig

https://github.com/InternLM/lmdeploy/blob/5a2aaf1dc81e101c282456305546787558e509ff/lmdeploy/messages.py#L202-L230

Tushar-ml commented 5 months ago

Thanks @RunningLeon I will try this

GuoXu-booo commented 5 months ago

@RunningLeon Hi! Due to server network limitations, I could not compile and install the latest lmdeploy on the server, so I downloaded an image of lmdeploy0.4.2 on docker hub and ran it, then ran cogvlm2 and reported an error:

root@gpu9:~/data/CogVLM2# python cogvlm_demo.py 2024-05-31 01:31:08,920 - lmdeploy - ERROR - TypeError: expected string or bytes-like object 2024-05-31 01:31:08,920 - lmdeploy - ERROR - test failed! model /root/data/cogvlm2-llama3-chinese-chat-19B/ requires transformers version None but transformers 4.40.2 is installed.

my code: from lmdeploy import pipeline from lmdeploy.vl import load_image

model_path = '/root/data/cogvlm2-llama3-chinese-chat-19B/'

pipe = pipeline(model_path)

image = load_image('/root/data/dataset/misumi_data/images/Misumi000006.jpg') response = pipe(('图中出现的零件是什么?', image)) print(response)

I look forward to your reply. Thank you

RunningLeon commented 5 months ago

@GuoXu-booo hi, because cogvlm is supported in pytorch engine and can you simply clone the code from pr and run pip install -e to install it. BTW, you better use the latest code from PR #1502. The check env part fails in your case as there's no transformers_version in the config.json, which is fixed in here

git clone --recursive -b support-cogvlm-dev https://github.com/RunningLeon/lmdeploy.git
cd lmdeploy 
pip install -e .
isidentical commented 5 months ago

@RunningLeon is there any plans to use turbomind for CogVLM since it is faster for llama3?

RunningLeon commented 5 months ago

@RunningLeon is there any plans to use turbomind for CogVLM since it is faster for llama3?

sorry. No plan yet.