FoundationVision / OmniTokenizer

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
https://www.wangjunke.info/OmniTokenizer/
MIT License
263 stars 7 forks source link

The provided checkpoint is trained by this code? #13

Closed shinshiner closed 4 months ago

shinshiner commented 4 months ago

Hello, I tried to use the provided checkpoint "imagenet_k600.ckpt" to infer some images and videos, but I found the model cannot read the checkpoint.

After checking, I found there are some typos in the code, namely you use "causal" in model definition, while the corresponding name in checkpoint is "casual", as shown below.

Therefore, I'm not sure whether the open-sourced code is matched with pretrained weights, or is there something I missed?

image image

shinshiner commented 4 months ago

The newest commit seems to roll back the typos, it looks like you fix them when open source the code, but it conflicts with the trained checkpoint