OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Apache License 2.0
2.39k stars 248 forks source link

VQGAN 模型的版本 #390

Closed PhoebusSi closed 1 year ago

PhoebusSi commented 1 year ago

请问下OFA是用的那个版本的VQAGAN模型?可否上传下checpont和config.yaml文件或者提供下链接?

PhoebusSi commented 1 year ago

我用的你给的checkpoint zipfile image_gen_large_best.zip中的vqgan/last.ckpt和vqgan/model.yaml,但是这样对256x256编码成token sequence的长度是32x32=1024而不是文中说的16x16=256。 请问是哪里的问题?

PhoebusSi commented 1 year ago

或者请问这里的code sequence(长度1024)对应的图片的resolution是多少?256吗?

image
logicwong commented 1 year ago

@PhoebusSi 直接对256x256编码那确实是1024长度。预训练时做的是image infilling,即还原图像中间部分的code,图像中部(128x128分辨率)编码出来的长度才是256