deepseek-ai / DeepSeek-VL

DeepSeek-VL: Towards Real-World Vision-Language Understanding
https://huggingface.co/spaces/deepseek-ai/DeepSeek-VL-7B
MIT License
2.08k stars 195 forks source link

feat: add multiple images (or in-context learning) conversation example #47

Closed StevenLiuWen closed 7 months ago

StevenLiuWen commented 7 months ago
conversation = [
    {
        "role": "User",
        "content": "<image_placeholder>A dog wearing nothing in the foreground, "
                   "<image_placeholder>a dog wearing a santa hat, "
                   "<image_placeholder>a dog wearing a wizard outfit, and "
                   "<image_placeholder>what's the dog wearing?",
        "images": [
            "images/dog_a.png",
            "images/dog_b.png",
            "images/dog_c.png",
            "images/dog_d.png",
        ],
    },
    {"role": "Assistant", "content": ""}
]