FoundationVision / OmniTokenizer

OmniTokenizer: one model and one weight for image-video joint tokenization.
https://www.wangjunke.info/OmniTokenizer/
MIT License
200 stars 4 forks source link

Considered doing VQ with LFQ? #2

Closed iamlockelightning closed 1 month ago

iamlockelightning commented 1 month ago

Thanks for publishing the code! Great work, very inspired!! As you regard MAGVIT(v1) as a baseline in the experiment, have you considered doing VQ with LFQ (replacing VQGAN), which is used in MAGVIT-v2?

wdrink commented 1 month ago

Thank you for your kind words about our work! LFQ, introduced by MAGVITv2, is superior in scaling the codebook size, we will consider incorporating it into the OmniTokenizer.