GAIR-NLP / anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
https://huggingface.co/spaces/ethanchern/Anole
618 stars 33 forks source link

Multimodal-in and multimodal-out #18

Open JoyBoy-Su opened 1 month ago

JoyBoy-Su commented 1 month ago

We will implement the script so that the model can take image as input.

JoyBoy-Su commented 1 month ago

We provide a script for multimodal inference: you can follow the instructions to run the script.

Mr-Loevan commented 1 month ago

We provide a script for multimodal inference: you can follow the instructions to run the script.

Thanks for you good job! I tried the multi-modal in and out script, But it generates nothing when prompt to generate images. What's the possible reason?

JoyBoy-Su commented 1 month ago

@Mr-Loevan Hi,can you give us more details? For example, your input.json and the output of your model.

I just tried to use the following input.json for inference:

[
    {
        "type": "text",
        "content": "Draw a picture showing a serene lakeside view at sunrise with mist rising from the water, surrounded by dense pine forests and mountains in the background."
    }
]

The output of the model is as follows:

It is a picturesque scene that reflects the beauty of nature in all its glory. The image captures the early morning hours when the sun rises over the horizon, casting a warm glow over the landscape. The lake surface is mirror-like, creating a reflection of the surrounding trees and mountains. There is a sense of tranquility and peace in the air, as if the area is protected from the hustle and bustle of everyday life.
<img: ./outputs/inference/1.png>

./outputs/inference/1.png: image