Similar to issue #2, would you like to compare this ckpt with stability's vqvae/vae and tencent's open-magvit2?

FoundationVision / OmniTokenizer

OmniTokenizer: one model and one weight for image-video joint tokenization.

https://www.wangjunke.info/OmniTokenizer/

MIT License

234 stars 5 forks source link

Similar to issue #2, would you like to compare this ckpt with stability's vqvae/vae and tencent's open-magvit2? #3

Closed StarCycle closed 3 months ago

StarCycle commented 3 months ago

Hi,

Do you compare your ckpt with the vae/vqvae here?

340554999-596a630a-e4b6-40e6-978f-de8a17c4a241

If the direct comparison is not reasonable because their ckpts are trained with more data, did they have a version that was only trained with imagenet data but with the same config? Or can you evaluate their ckpt on imagenet even if these are trained on more data...I just want to know which ckpt is most suitable for my current application.

Also tencent released their Open-MAGVIT2

Best, StarCycle

wdrink commented 3 months ago

Thanks for your great suggestion. We evaluated the reconstruction performance of SD1.4 VAE on ImageNet and the rFID is 0.74. As you say, since they adopt much more training data, it's hard to make a fair comparison with them.