VQGAN 模型的版本 - Githubissues

OFA-Sys / OFA

Official repository of OFA (ICML 2022). Paper: OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework

Apache License 2.0

2.39k stars 248 forks source link

VQGAN 模型的版本 #390

Closed PhoebusSi closed 1 year ago

PhoebusSi commented 1 year ago

请问下OFA是用的那个版本的VQAGAN模型？可否上传下checpont和config.yaml文件或者提供下链接？

PhoebusSi commented 1 year ago

我用的你给的checkpoint zipfile image_gen_large_best.zip中的vqgan/last.ckpt和vqgan/model.yaml，但是这样对256x256编码成token sequence的长度是32x32=1024而不是文中说的16x16=256。请问是哪里的问题？

PhoebusSi commented 1 year ago

或者请问这里的code sequence（长度1024）对应的图片的resolution是多少？256吗？

logicwong commented 1 year ago

@PhoebusSi 直接对256x256编码那确实是1024长度。预训练时做的是image infilling，即还原图像中间部分的code，图像中部（128x128分辨率）编码出来的长度才是256