andrew-cr / discrete_flow_models

Code for the paper https://arxiv.org/abs/2402.04997
MIT License
41 stars 0 forks source link

how to choose the noise parameters in sampling? #1

Open Xiaohui9607 opened 3 months ago

Xiaohui9607 commented 3 months ago

Hi Campbell, I saw in the notebook demo and the sampling code, there is a hyperparameter noise. In sampling.py it is set to 0.0, and it's 1.0 in the uniform demo and 10.0 in the mask demo. Is there any principle to choose this parameter? thanks

andrew-cr commented 3 months ago

So the noise parameter will control how much 'mixing' happens during sampling e.g. with the masking process how much it flips back and forth between mask and unmask. In the masking case, if we integrate with time step dt and have D dimensions, we have that noise * dt * D is the average number of dimensions that get set back to mask in each integration step. We don't want this to be too big a proporition of all dimensions otherwise the process could become degenerate with everything getting set back to mask all the time. So maybe as a rule of thumb we wouldn't want more than say 10% of the dimensions to get switched back each integration step noise * dt * D < 0.1 D => noise < 0.1/dt so for dt = 0.001 we would have noise < 100. From this we also see that the higher you make dt, the smaller you will want to set noise (noise = 0 is likely the easiest to simulate with highest dt).

Other than this very rough upper bound, at the end of the day it should be set so that you observe best sampling performance on the task you are interested in. In theory, any value of noise will achieve the desired marginals and result in a sample from the data distribution. In practice however, our denoising distribution is only approximate and we introduce discretization error during simulation as well. In this case, we are forced to empirically choose the noise that works best in our real world approximate setting.

One final thing to note is that it is also possible to have the noise hyperparameter depend on the time variable. E.g. you can have more noise during the final parts of simulation and less noise nearer the beginning. In my previous work https://arxiv.org/pdf/2205.14987 top of page 39, we found it is most beneficial to introduce noise near the end of sampling. Though I haven't systematically investigated this for the discrete flow models case.