ICLR 2024 (accepted as oral):\
paper on openreview\
paper on arXiv (v1: 4 Oct 2023 , v2: 15 Mar 2024) \
Zahra Kadkhodaie, Florentin Guth, Eero P. Simoncelli, Stephane Mallat
The denoisers directory contains several denoisers, trained for removing Gaussian noise from images with the objective of minimizing mean square error. All denoisers are universal and "blind": they can remove noise of any standard deviation, and this standard deviation does not need to be specified. The denoisers directory contains a separate folder for each architecture (UNet, BF_CNN), with code specified in code/network.py. Within each architecure directory, there are multiple folders containing variants of that denoiser trained on different datasets.
The code directory contains the python code for
The notebooks folder contains demo code for generating results and figures shown in the paper.
You'll need python version 3.9.13 and pytorch 1.13.1 and the following packages to execute the code: \
os \ time \ sys \ gzip \ skimage 0.19.2 \ matplotlib 3.5.2 \ argparse 1.1 \ scipy 1.9.1 \ PIL 9.2.0 \ pywt 1.3.0
Deep neural networks (DNNs) trained for image denoising are able to generate high-quality samples with score-based reverse diffusion algorithms. But how do they acheive this feat? There are two possible candidate strategies:
In this work, we first ask which one of these strategies diffusion models adopt? Are they memorizing or generalizing?
We confirm that when trained on small data sets (relative to the capacity of the network) these network memorize the training set, but we also demonstrate that these same models stop memorizing and transition to generalization when trained on sufficiently large sets. Specifically, we show that two denoisers trained on sufficiently large non-overlapping sets converge to essentially the same denoising function. That is, the learned model becomes independent of the training set (i.e., model variance falls to zero). As a result, when used for image generation, these networks produce nearly identical samples.
These results provide stronger and more direct evidence of generalization than standard comparisons of average performance on train and test sets.
But how is this generalization possible despite the curse of dimensionality? In the absence of all inductive biases, to learn a density of 8-bit images of resolution $80\times80$ the size of the required data set is $N = 256 ^ {80\times80}$, which is larger than the number of atoms in the universe.
Our experiments show that generalization is achieved with a much smaller and realizable training set (roughly $10^5$ images suffices), reflecting powerful inductive biases of these networks. What are the inductive biases of these networks which give rise to such strong generalization?
We analyze the learned denoising functions and show that the inductive biases give rise to a shrinkage operation in a basis adapted to the underlying image. Examination of these bases reveals oscillating harmonic structures along contours and in homogeneous regions. We demonstrate that trained denoisers are inductively biased towards these geometry-adaptive harmonic bases since they arise not only when the network is trained on photographic images, but also when it is trained on image classes supported on low-dimensional manifolds for which the harmonic basis is suboptimal. Finally, we show that when trained on regular image classes for which the optimal basis is known to be geometry-adaptive and harmonic, the denoising performance of the networks is near-optimal.
@inproceedings{ \ kadkhodaie2024generalization, \ title={Generalization in diffusion models arises from geometry-adaptive harmonic representation}, \ author={Zahra Kadkhodaie and Florentin Guth and Eero P Simoncelli and St{\'e}phane Mallat}, \ booktitle={The Twelfth International Conference on Learning Representations}, \ year={2024}, \ url={https://openreview.net/forum?id=ANvmVS2Yr0} \ }