Considered doing VQ with LFQ?

FoundationVision / OmniTokenizer

OmniTokenizer: one model and one weight for image-video joint tokenization.

https://www.wangjunke.info/OmniTokenizer/

MIT License

200 stars 4 forks source link

Considered doing VQ with LFQ? #2

Closed iamlockelightning closed 1 month ago

iamlockelightning commented 1 month ago

Thanks for publishing the code! Great work, very inspired!! As you regard MAGVIT(v1) as a baseline in the experiment, have you considered doing VQ with LFQ (replacing VQGAN), which is used in MAGVIT-v2?

wdrink commented 1 month ago

Thank you for your kind words about our work! LFQ, introduced by MAGVITv2, is superior in scaling the codebook size, we will consider incorporating it into the OmniTokenizer.