KarenUllrich / Tutorial_BayesianCompressionForDL

A tutorial on "Bayesian Compression for Deep Learning" published at NIPS (2017).
MIT License
203 stars 48 forks source link

Samples of compression for LeNet #2

Open Lyken17 opened 6 years ago

Lyken17 commented 6 years ago

Hi author

Thanks for sharing the code. I am pretty interested in this work. When I am testing the compression LeNet, it raises "dimension not match" error. Could you share an example of compressing neural network with convolutional layers?

KarenUllrich commented 6 years ago

Hi Lyken17,

sorry for coming back to you so late. Notifications are activated now ;).

The first thing that pops into my mind is a pytorch version issue. Could you provide me a

conda list

or equivalent?

A complete example is included and you should be able to run it simply. What exactly are you missing in our tutorial?

Best,

Karen

Lyken17 commented 6 years ago

Hi Karen

The output of conda list is

(test) ➜  Tutorial_BayesianCompressionForDL git:(master) ✗ conda list
# packages in environment at /home/ligeng/anaconda3/envs/test:
#
# Name                    Version                   Build  Channel
ca-certificates           2018.03.07                    0
certifi                   2018.1.18                py36_0
cycler                    0.10.0                    <pip>
imageio                   2.3.0                     <pip>
kiwisolver                1.0.1                     <pip>
libedit                   3.1                  heed3624_0
libffi                    3.2.1                hd88cf55_4
libgcc-ng                 7.2.0                hdf63c60_3
libstdcxx-ng              7.2.0                hdf63c60_3
matplotlib                2.2.2                     <pip>
ncurses                   6.0                  h9df7e31_2
numpy                     1.14.2                    <pip>
openssl                   1.0.2o               h20670df_0
pandas                    0.22.0                    <pip>
Pillow                    5.1.0                     <pip>
pip                       9.0.3                    py36_0
pyparsing                 2.2.0                     <pip>
python                    3.6.5                hc3d631a_0
python-dateutil           2.7.2                     <pip>
pytz                      2018.4                    <pip>
PyYAML                    3.12                      <pip>
readline                  7.0                  ha6073c6_4
scipy                     1.0.1                     <pip>
seaborn                   0.8.1                     <pip>
setuptools                39.0.1                   py36_0
six                       1.11.0                    <pip>
sqlite                    3.22.0               h1bed415_0
tk                        8.6.7                hc745277_3
torch                     0.3.1                     <pip>
torchvision               0.2.0                     <pip>
wheel                     0.31.0                   py36_0
xz                        5.2.3                h55aa19d_2
zlib                      1.2.11               ha838bed_2

When I try to run example lenet by python example.py, I get following errors

(test) ➜  Tutorial_BayesianCompressionForDL git:(master) ✗ python example.py
Traceback (most recent call last):
  File "example.py", line 193, in <module>
    main()
  File "example.py", line 37, in main
    transforms.ToTensor(),lambda x: 2 * (x - 0.5),
  File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torchvision/datasets/mnist.py", line 53, in __init__
    os.path.join(self.root, self.processed_folder, self.training_file))
  File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/serialization.py", line 267, in load
    return _load(f, map_location, pickle_module)
  File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/serialization.py", line 420, in _load
    result = unpickler.load()
AttributeError: Can't get attribute '_rebuild_tensor_v2' on <module 'torch._utils' from '/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/_utils.py'>
Lyken17 commented 6 years ago

Oops, the error is different from what I saw two month before. I guess there be some API update in Torch.

Lyken17 commented 6 years ago

After solving some compatibility issues, I modify the network to LeNet and re-rerun python example.py

The network structure is

    class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            self.conv1 = BayesianLayers.Conv2dGroupNJ(1, 6, 5)
            self.conv2 = BayesianLayers.Conv2dGroupNJ(6, 16, 5)
            # activation
            self.relu = nn.ReLU()
            # layers
            self.fc1 = BayesianLayers.LinearGroupNJ(16*5*5, 120, clip_var=0.04, cuda=FLAGS.cuda)
            self.fc2 = BayesianLayers.LinearGroupNJ(120, 84, cuda=FLAGS.cuda)
            self.fc3 = BayesianLayers.LinearGroupNJ(84, 10, cuda=FLAGS.cuda)
            # layers including kl_divergence
            self.kl_list = [self.conv1, self.conv2, self.fc1, self.fc2, self.fc3]

        def forward(self, x):
            # x = x.view(-1, 28 * 28)
            # x = self.relu(self.fc1(x))
            # x = self.relu(self.fc2(x))
            out = F.relu(self.conv1(x))
            out = F.max_pool2d(out, 2)
            out = F.relu(self.conv2(out))
            out = F.max_pool2d(out, 2)
            out = out.view(out.size(0), -1)
            out = F.relu(self.fc1(out))
            out = F.relu(self.fc2(out))
            out = self.fc3(out)
            return out

command line output is

(test) ➜  Tutorial_BayesianCompressionForDL git:(master) ✗ python example.py
Traceback (most recent call last):
  File "example.py", line 217, in <module>
    main()
  File "example.py", line 176, in main
    train(epoch)
  File "example.py", line 147, in train
    output = model(data)
  File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "example.py", line 90, in forward
    out = F.relu(self.fc1(out))
  File "/home/ligeng/anaconda3/envs/test/lib/python3.6/site-packages/torch/nn/modules/module.py", line 357, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ligeng/Public/Developing/Tutorial_BayesianCompressionForDL/BayesianLayers.py", line 126, in forward
    xz = x * z
RuntimeError: The size of tensor a (256) must match the size of tensor b (400) at non-singleton dimension 1

The modified example.py is uploaded to gist https://gist.github.com/Lyken17/8e0cae9a9aa6911190fd1b580ca75296

I can run original example without problem, but when I try to run with convolutional layer, I cannot figure out the proper way. Could you show an example of pruning LeNet?

KarenUllrich commented 6 years ago

Hi Lyken17,

the problem you are experiencing has little to do with the Bayesian Layer but rather with a shape mismatch. The feature map coming out of 'conv2' is (16x4x4). If you change it, it should run. Additionally, I recommend telling all layers the cuda status.

class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            # activation
            self.relu = nn.ReLU()
            # layers
            self.conv1 = BayesianLayers.Conv2dGroupNJ(1, 6, 5, cuda=FLAGS.cuda)
            self.conv2 = BayesianLayers.Conv2dGroupNJ(6, 16, 5, cuda=FLAGS.cuda)
            self.fc1 = BayesianLayers.LinearGroupNJ(16*4*4, 120, clip_var=0.04, cuda=FLAGS.cuda)
            self.fc2 = BayesianLayers.LinearGroupNJ(120, 84, cuda=FLAGS.cuda)
            self.fc3 = BayesianLayers.LinearGroupNJ(84, 10, cuda=FLAGS.cuda)
            # layers including kl_divergence
            self.kl_list = [self.conv1, self.conv2, self.fc1, self.fc2, self.fc3]

        def forward(self, x):
            out = F.relu(self.conv1(x))
            out = F.max_pool2d(out, 2)
            out = F.relu(self.conv2(out))
            out = F.max_pool2d(out, 2)
            out = out.view(out.size(0), -1)
            out = F.relu(self.fc1(out))
            out = F.relu(self.fc2(out))
            out = self.fc3(out)
            return out

Runs for me!

I will also add a requirements file so that we do not run into trouble with pytorch's API changes.

Cheers, Karen

gullalc commented 6 years ago

@KarenUllrich The network trains fine for convolution layers, but the compression.py functions do not work for convolutional weights/filters. I have made some changes in the compute_posterior_params to compute post_weight_mu and post_weight_var correctly for Convolutional layers.

I still get the error in extract_pruned_params because the size of mask and post_weight_mu for Conv layer 1 is different. To be specific, if you consider the above example, post_weight_mu has size (6,1,5,5) where as mask has a size (16,6). It looks like, get_masks() needs to be changed as well to get the correct masks for convolutional filters.

Is it?

aswin-raghavan commented 5 years ago

Hi,

I am having the same issue. The conv network trains but I am unable to get compression rates - same error as above. Here is a snippet to reproduce - part of example.py.

class Net(nn.Module):
        def __init__(self):
            super(Net, self).__init__()
            # activation
            self.relu = nn.ReLU()
            # layers
            self.conv1 = BayesianLayers.Conv2dGroupNJ(1, 16, 5, cuda=FLAGS.cuda, padding=2) 
            self.conv2 = BayesianLayers.Conv2dGroupNJ(16, 36, 5, cuda=FLAGS.cuda, padding=2) 
            self.fc1 = BayesianLayers.LinearGroupNJ(36 * 7 * 7, 128, clip_var=0.04, cuda=FLAGS.cuda) 
            self.fc2 = BayesianLayers.LinearGroupNJ(128, 10, cuda=FLAGS.cuda)
            #pool
            self.pool = nn.MaxPool2d((2,2))
            # layers including kl_divergence
            self.kl_list = [self.conv1, self.conv2, self.fc1, self.fc2]

        def forward(self, x):
            x = x.view(-1, 1, 28, 28)
            x = self.conv1(x)
            x = self.pool(x)
            x = self.relu(x)
            x = self.conv2(x)
            x = self.pool(x)
            x = self.relu(x)
            x = x.view(-1, 36*7*7)
            x = self.relu(self.fc1(x))
            x = self.fc2(x)
            return x

I run python convMLP.py --batchsize 64 --epochs 1 I get

Epoch: 1        Train loss: 15.456320
Test loss: 0.0380, Accuracy: 9883/10000 (98.83%)

Traceback (most recent call last):
  File "convMLP.py", line 204, in <module>
    main()
  File "convMLP.py", line 181, in main
    compute_compression_rate(layers, model.get_masks(thresholds))
  File "compression.py", line 119, in compute_compression_rate
    weight_mus, weight_vars = extract_pruned_params(layers, masks)
  File "compression.py", line 83, in extract_pruned_params
    post_weight_mu, post_weight_var = layer.compute_posterior_params()
  File "BayesianLayers.py", line 251, in compute_posterior_params
    self.post_weight_var = self.z_mu.pow(2) * weight_var + z_var * self.weight_mu.pow(2) + z_var * weight_var
RuntimeError: The size of tensor a (16) must match the size of tensor b (5) at non-singleton dimension 3

In your paper you show compression rates for VGG and convolutional architectures, that is what I am trying to reproduce. Help!

Aswin

gullalc commented 5 years ago

You will need to make some changes in BayesianLayers.py and get_masks() function to prune the conv layers. With the current code, you can only prune linear layers.

def compute_posterior_params(self):
        weight_var, z_var = self.weight_logvar.exp(), self.z_logvar.exp()
        part1 = self.z_mu.pow(2)[:, None, None, None] * weight_var
        part2 = z_var[:, None , None, None] * self.weight_mu.pow(2)
        part3 = z_var[:, None , None, None] * weight_var
        self.post_weight_var = part1 + part2 + part3
        self.post_weight_mu = self.z_mu[:, None , None, None] * self.weight_mu
        return self.post_weight_mu, self.post_weight_var

To explain this in a bit more detail, z_mu and weight_var for lenet-5's first conv layer are respectively of size (20) and (20,1,5,5), and therefore you get a error in multiplying them.

You will also need to change the get_masks() function, to create mask for conv weights.

aswin-raghavan commented 5 years ago

Thank you @gullalc for your answer. Do you know what the changed get_masks() would be? EDIT: It would be great if you issue a PR with those changes to conv and hopefully the authors will merge the changes. EDIT2: Thank you for adding an explanation. It would be great if @KarenUllrich can comment.

gullalc commented 5 years ago

Sure. This is the get_masks() function I am using. Basically incorporating the difference in size of weights in conv layers and linear layers, as done in compute posterior params. Secondly, flattening out the mask of last conv layer so that it can be multiplied with mask of linear layer. The code is self explanatory. I think, this should work with both CNNs and fully connected neural networks, although it can be simplified a bit more.

        def get_masks(self,thresholds):
            weight_masks = []
            mask = None
            for i, (layer, threshold) in enumerate(zip(self.kl_list, thresholds)):
                # compute dropout mask
                if len(layer.weight_mu.shape) > 2:
                    if mask is None:
                        mask = [True]*layer.in_channels
                    else:
                        mask = np.copy(next_mask)

                    log_alpha = layers[i].get_log_dropout_rates().cpu().data.numpy()
                    next_mask = log_alpha < thresholds[i]

                    weight_mask = np.expand_dims(mask, axis=0) * np.expand_dims(next_mask, axis=1)
                    weight_mask = weight_mask[:,:,None,None]
                else:
                    if mask is None:
                        log_alpha = layer.get_log_dropout_rates().cpu().data.numpy()
                        mask = log_alpha < threshold
                    elif len(weight_mask.shape) > 2:
                        temp = next_mask.repeat(layer.in_features/next_mask.shape[0])
                        log_alpha = layer.get_log_dropout_rates().cpu().data.numpy()
                        mask = log_alpha < threshold
                        #mask = mask | temp  ##Upper bound for number of weights at first fully connected layer
                        mask = mask & temp   ##Lower bound for number of weights at fully connected layer
                    else:
                        mask = np.copy(next_mask)

                    try:
                        log_alpha = layers[i + 1].get_log_dropout_rates().cpu().data.numpy()
                        next_mask = log_alpha < thresholds[i + 1]
                    except:
                        # must be the last mask
                        next_mask = np.ones(10)

                    weight_mask = np.expand_dims(mask, axis=0) * np.expand_dims(next_mask, axis=1)

                weight_masks.append(weight_mask.astype(np.float))

            return weight_masks