CalculatedContent / WeightWatcher

The WeightWatcher tool for predicting the accuracy of Deep Neural Networks
Apache License 2.0
1.47k stars 124 forks source link

One potential problem of fitting alpha #103

Closed nsfzyzz closed 2 years ago

nsfzyzz commented 3 years ago

One potential problem I see with the fitting of alpha is that we always use the same powerlaw probabilistic density function (pdf) without considering the effect of xmax. Consider the following example.

image (15)

The pdf for a powerlaw distribution with xmin specified and a powerlaw distribution with both xmin and xmax specified are different. See the pdfs in the example above.

In the case that we specify xmax, the pdf is influenced by xmax, and thus the MLE needs to be changed (simply because the pdf is changed). If we keep using a pdf without xmax, that pdf is the wrong one, and thus the MLE gives a biased estimate.

I don't know if this is a bug or a feature. From theory, it seems that the fitting in weightwatcher clearly ignores the xmax in the pdf because it directly used the powerlaw package which uses the one without xmax. However, maybe this is something related to the particular use case of weightwatcher.

The one line of code in the powerlaw package that can be changed to address this problem is the following. https://github.com/jeffalstott/powerlaw/blob/6732699d790edbe27c2790bf22c3ef7355d2b07e/powerlaw.py#L1188

Note that the powerlaw package does have one normalizer that takes into account the xmax issue. However, that one is not used in the current weightwatcher trunk.

charlesmartin14 commented 3 years ago

Maybe.

The current weightwatcher code uses the default powerlaw fit, with xmax=max eigenvalue because this gives the most consistent results

But the normalization used is the original normalization defined by Clauset et. al. and does not account for the finite cutoff.

The reason we do this for weightwatcher is that, without setting xmax, the estimator will give small alphas for cases where there are large fingers pointing down / finite size effects, as in issue #102

Here's an example from VGG19


import weightwatcher as ww
import torchvision.models as model
model = models.vgg19(pretrained=True)
watcher = ww.WeightWatcher(model=model)
esd = watcher.get_ESD(layer=9)

Using the default options (ww2x=False), we have 1 layer with a crazy large alpha ~15

fit = powerlaw.Fit(esd, xmax=np.max(esd))
fit.alpha=**15.55**
Screen Shot 2021-10-12 at 1 02 18 AM

But if we don't use xmax, alpha ~ 2 and this alpha similar (maybe the same) as the fix_fingers='clip_xmax' option

fit = powerlaw.Fit(esd)
fit.alpha = **1.92**

Which seems way too small, and I suspect is just fitting the random bulk region of the ESD, not the correlated stucture

Also, this behavior is inconsistent, and for VGG13, layer 9, the alpha=9, and the estimates do not change.

What is the alpha we get for VGG19, layer 9 if we set xmax, but use the new normalization ?

charlesmartin14 commented 2 years ago

There appears to be no resolution here. Closing this issue for now.