Why do video frames in lmdeploy need to be converted into base64 encoding?

InternLM / lmdeploy

LMDeploy is a toolkit for compressing, deploying, and serving LLMs.

https://lmdeploy.readthedocs.io/en/latest/

Apache License 2.0

4.64k stars 427 forks source link

Why do video frames in lmdeploy need to be converted into base64 encoding? #2759

Open AmazDeng opened 4 hours ago

AmazDeng commented 4 hours ago

@irexyc @lvhan028 @AllentDan

In my study of the lmdeploy framework, I found that for video inference, the framework first converts video frames from PIL.Image.Image objects to base64-encoded strings outside the framework. Then, inside the framework, it decodes the base64 strings back into PIL.Image.Image objects. Why doesn’t the lmdeploy framework directly accept a List[PIL.Image.Image] as input when do video inference? This conversion process is actually quite time-consuming.

irexyc commented 2 hours ago

For online serving, we followed the OpenAI format which accept url and base64 data.

For offline usage (pipeline), you could actually pass messages like:

img = Image.open('...')
messages = [
    dict(role='user', content=[
        dict(type='text', text='Describe the images in detail.'),
        dict(type='image_url', image_url=dict(url=img))
    ])
]
pipe(messages)

AmazDeng commented 2 hours ago

Understood, thank you.