genmoai / mochi

The best OSS video generation models
Apache License 2.0
2.09k stars 210 forks source link

VAE Fourier feature #67

Closed GooThinker closed 5 days ago

GooThinker commented 2 weeks ago

Good job! We found that the settings in vae are start=6, stop=8 to get w = torch.pow(2.0, freqs) (2 torch.pi) , what is the special meaning of this frequency? When we train vae, do we add frequency domain features to the video before feeding it into the model?

ved-genmo commented 1 week ago

Checkout test_encoder_decoder.py to see how to use the fourier features. @ajayjain what's the reasoning behind these constants?

ajayjain commented 5 days ago

The VAE is trained with those constants, so you should keep using those specific values. It's also important to compute those features in float32, not quantized.

The fourier features help the encoder see fine-grained differences in pixel values. For example, a neural network might struggle to differentiate a pixel with color 254 versus 255, and the frequencies provided by the fourier features make the difference more obvious.