crowsonkb / k-diffusion

Karras et al. (2022) diffusion models for PyTorch
MIT License
2.26k stars 372 forks source link

Reproducibility work on the sampling code #44

Closed ardacihaner closed 1 year ago

ardacihaner commented 1 year ago

The sampling code includes a bit of randomness in noise generation. To make the noise generation process more deterministic, we propose generating the random noise for each item in the batch according to a predetermined seed. This approach is used in the production code of NovelAI to ensure reproducibility.

crowsonkb commented 1 year ago

Hi, I have work ongoing on a feature (in a separate branch rn, noise-samplers) that will let you feed in whatever random tensors you want as well as enable more sophisticated methods of sampling noise that produce more consistent outputs across different numbers of timesteps and noise schedules (using a Brownian tree to sample and combine noise increments deterministically based on a single seed). Could you look at https://github.com/crowsonkb/k-diffusion/issues/25#issuecomment-1305104374 and see if it'll suffice for your use case? Especially the new Brownian tree noise sampler which supports separate seeds per batch item.

Also, this PR uses the Python default RNG to draw seeds and I would want all randomness to come from the PyTorch RNG to avoid people having to seed multiple RNGs (since forgetting to seed all relevant RNGs is a common bug people write).

ardacihaner commented 1 year ago

Thanks for the answer, yeah that feature looks neat and probably will work for our use case. I'm closing this PR.