Question about the training data

GAIR-NLP / anole

Anole: An Open, Autoregressive and Native Multimodal Models for Interleaved Image-Text Generation

676 stars 36 forks source link

Dear authors,Thank you for your excellent work. I have a question regarding your training methodology, specifically concerning the utilization of training data. Upon examining the code in your GitHub repository (https://github.com/GAIR-NLP/anole/blob/219a9a3c8b2d2b67a9bcf92d341faaa16335b1fe/facilitating_image_generation/train_image_head.py#L19), I noticed that only image tokens appear to be fed into the network. Could you please confirm if my understanding is correct? If so, I'm curious about how the model learns to generate images corresponding to different text inputs？

GAIR-NLP / anole

Question about the training data #37