GAIR-NLP / anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation
https://huggingface.co/spaces/ethanchern/Anole
618 stars 33 forks source link

bad results! #5

Closed yyyouy closed 1 month ago

yyyouy commented 1 month ago

Hello, thank you for your work. I have a question regarding the inference results. The results I'm seeing are quite different from what you've displayed on your website. Are the results you've shown from the models hosted on Hugging Face, or are they cherry-picked examples? Alternatively, could these be results generated using the 30B model? Thank you for your clarification.

image

EthanC111 commented 1 month ago

Hi, thank you for your interest in our work! The results displayed on our website are indeed generated by the model we host on Huggingface. We are working on releasing the 30b model soon and on further improving the current 7b model.

yyyouy commented 1 month ago

I appreciate your response. However, I am unable to replicate the results shown in the image. Could you please provide insight into potential reasons for this discrepancy? image

EthanC111 commented 1 month ago

Hi, for the current version you may want to generate the image for several more times. Please follow the following command for image generation:

python text2image.py -i INSTRUCTION [-b BATCH_SIZE] [-s SAVE_DIR]

Thank you!