Closed GooThinker closed 5 days ago
Checkout test_encoder_decoder.py
to see how to use the fourier features. @ajayjain what's the reasoning behind these constants?
The VAE is trained with those constants, so you should keep using those specific values. It's also important to compute those features in float32, not quantized.
The fourier features help the encoder see fine-grained differences in pixel values. For example, a neural network might struggle to differentiate a pixel with color 254 versus 255, and the frequencies provided by the fourier features make the difference more obvious.
Good job! We found that the settings in vae are start=6, stop=8 to get w = torch.pow(2.0, freqs) (2 torch.pi) , what is the special meaning of this frequency? When we train vae, do we add frequency domain features to the video before feeding it into the model?