AILab-CVC / SEED

Official implementation of SEED-LLaMA (ICLR 2024).
https://ailab-cvc.github.io/seed
Other
576 stars 31 forks source link

question about image2text training? #3

Closed zzchust closed 1 year ago

zzchust commented 1 year ago

image 1. 图像->文本: 这里输入的是图像的token-ID还是vector啊, 我看这里说对图像的表示用FC进行了处理映射,然后再和text embeeding串起来的?

geyuying commented 1 year ago

输入的是图像的token-ID在visual codebook里对应的visual codes(vector),再接FC(使得和word embedding的dimenison对齐)。