albermax / innvestigate

A toolbox to iNNvestigate neural networks' predictions!
Other
1.27k stars 233 forks source link

Gradient results different from Guided Backprop/DeConvNet for a simple model #143

Closed bnaman50 closed 5 years ago

bnaman50 commented 5 years ago

Hey Alber,

I was looking at the analysis results of gradient vs Guided Backprop/DeConvNet methods for simple models. After reading the papers, my impression is that these methods are similar except for how they deal with the non-linearity. So if you have a simple 1-layer deep network with no non-linearity in the network, one would expect to see the exact same results but the results are different. To be precise, the results of guided-backprop/DeConvNet are a scaled version of gradient analysis which is weird.

Here is the code to reproduce the results.

from keras.utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Activation, Flatten
from keras.layers import Conv2D, LeakyReLU
import keras
from keras import backend as K

import innvestigate
import innvestigate.utils as iutils

import numpy as np
import ipdb

n = 11
num_classes = 3

# Model With Relu Activation
########################################
K.clear_session()
model = Sequential()
model.add(Dense(num_classes, input_dim = n))
model.add(Activation('softmax'))
#model.summary()

# Input
############
x = np.random.randint(low=0, high=256, size=(1, n)) 

# Model Prediction 
#####################
pred = model.predict(x)
print('Prediction results are: ', pred)
neuron = np.argmax(pred)
print('Predicted class is', neuron)

#Heatmaps
############################
methods = ['gradient', 'guided_backprop', 'deconvnet']
model_wo_softmax = iutils.keras.graph.model_wo_softmax(model)

analysis_list = []
print('Computing analysis on neuron: ', neuron)
for method in methods:
    analyzer = innvestigate.create_analyzer(method, model_wo_softmax, neuron_selection_mode="index")
    analysis = analyzer.analyze(x, neuron_selection=neuron)
    analysis_list.append(analysis)

print('\nGradient analysis is: ')
print(analysis_list[0])

print('\nGuided Backprop analysis is: ')
print(analysis_list[1])

print('\nDeConvNet analysis is: ')
print(analysis_list[2])

[w, b] = model.get_weights()
print('\nWeights of the model are:')
print(w)
print('\nGradient results match with the {} coloumn of weight matrix as per the prediction of the model'.format(neuron) )

print('\nBUT SOMEHOW GRADIENT ANALYSIS IS SCALED VERSION OF GUIDED BACKPROP WITH A SCALING FACTOR AS BELOW\n')
aa = analysis_list[1][0]/analysis_list[0][0]
print(aa)
print('\nBUT WHY IS THIS HAPPENING')

Here is the terminal log.

Using TensorFlow backend.
2019-03-26 01:51:32.293096: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2
2019-03-26 01:51:32.305560: I tensorflow/core/common_runtime/process_util.cc:69] Creating new thread pool with default inter op setting: 2. Tune using inter_op_parallelism_threads for best performance.
Prediction results are:  [[0. 0. 1.]]
Predicted class is 2
Computing analysis on neuron:  2

Gradient analysis is: 
[[ 0.38865197  0.61281455  0.16695768 -0.36148044  0.44741893  0.04357082
   0.4637028   0.26968932 -0.51349425  0.5013695  -0.5968782 ]]

Guided Backprop analysis is: 
[[ 61.387623   96.79413    26.370985  -57.09588    70.669876    6.8820157
   73.24191    42.59746   -81.106476   79.19137   -94.27698  ]]

DeConvNet analysis is: 
[[ 61.387623   96.79413    26.370985  -57.09588    70.669876    6.8820157
   73.24191    42.59746   -81.106476   79.19137   -94.27698  ]]

Weights of the model are:
[[ 0.18153429  0.17280245  0.38865197]
 [-0.64964014 -0.48228353  0.61281455]
 [ 0.2599224  -0.3978359   0.16695768]
 [-0.20464295  0.6025032  -0.36148044]
 [ 0.33941442  0.6026162   0.44741893]
 [ 0.08924228  0.5203458   0.04357082]
 [ 0.12038481 -0.42020288  0.4637028 ]
 [ 0.2917682  -0.55854344  0.26968932]
 [-0.50714564 -0.5298555  -0.51349425]
 [ 0.16411918 -0.6383469   0.5013695 ]
 [ 0.19596654 -0.3590484  -0.5968782 ]]

Gradient results match with the 2 coloumn of weight matrix as per the prediction of the model

BUT SOMEHOW GRADIENT ANALYSIS IS SCALED VERSION OF GUIDED BACKPROP WITH A SCALING FACTOR AS BELOW

[157.95012 157.95012 157.95012 157.95012 157.95012 157.95012 157.95012
 157.95012 157.95012 157.95012 157.95012]

BUT WHY IS THIS HAPPENING

It would be great if you could explain as to why is this happening.

Thanks, Naman

albermax commented 5 years ago

Hi Naman,

yes, indeed for a one-layer network w/o relu they are the same except for the scaling. The gradient-backprop is initialized with a 1, while Deconvnet and GB are initialized with the output value of the functions. Does this answer your question?

Cheers, Max