FoundationVision / OmniTokenizer

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
https://www.wangjunke.info/OmniTokenizer/
MIT License
263 stars 7 forks source link

It takes a **HUGE** memory #5

Closed lucasjinreal closed 5 months ago

lucasjinreal commented 5 months ago

After testing, the OminTokenizer takes an unbelevable memory footprint, my 16GB card OOM directly....

wdrink commented 5 months ago

OmniTokenizer is a video model, which always takes more memory than image models. Actually, the parameters of OmniTokenizer is not that much, compared to SD VAE.

lucasjinreal commented 5 months ago

If am not read it wrong, it's a image && video tokenizer, this Memory takes 16GB more, is not not that much