FoundationVision / OmniTokenizer

[NeurIPS 2024]OmniTokenizer: one model and one weight for image-video joint tokenization.
https://www.wangjunke.info/OmniTokenizer/
MIT License
264 stars 7 forks source link

two tiny problems #14

Open dreamofuture opened 4 months ago

dreamofuture commented 4 months ago

hello, found two problems as below:

1. scale in Attention is set to 8 = sqrt(dim_head), not 1/sqrt(dim_head) as normal used, is this a special design or a bug?
2. NLayerDiscriminator3D use the output of leaky_relu(or followed by sigmoid) as logits (which not consistent with NLayerDiscriminator2D), is this OK?   besides, self.n_layers+2 in forward not true when use_sigmoid.

thanks.