Closed mervess closed 3 years ago
Hi, mervess. Can you show me the error when apply the spkeras to convert DepthwiseConv2D? I do not think it would be a big problem to convert other convolutional layers, since SpKeras calculates the maximum activation by searching the layer with activation attribute.
This was the error I was getting:
Building new model...
Traceback (most recent call last):
File "simple_example.py", line 37, in <module>
snn_model = cnn_to_snn(signed_bit=0)(cnn_model,x_train)
File "spkeras/spkeras/models.py", line 43, in __call__
timesteps=self.timesteps)
File "spkeras/spkeras/models.py", line 168, in convert
snn_model.layers[-1].set_weights(weights)
File "lib/python3.7/site-packages/tensorflow/python/keras/engine/base_layer.py", line 1783, in set_weights
'shape %s' % (ref_shape, weight.shape))
ValueError: Layer weight shape (3, 3, 32, 1) not compatible with provided weight shape (3, 3, 32, 32)
The layer in the for loop during the above case is Depthwise. So, to my reasoning, it was mainly about mismatching weight shapes between layers caused by Depthwise operation. Note that because I made some changes in the code, the line numbers might not exactly the same for you.
I reckon I have solved it via minor changes in the code.
I tracked the issue to function batchnormalization where the shapes get affected, then updated its code to maintain the weight shape between layers.
The below is the new version and it seems to work for me.
axis = 1 if K.image_data_format() == 'channels_first' else -1
def batchnormalization( weights, layer, amp_factor=100, dw=False ):
gamma,beta,mean,variance = layer.get_weights()
if dw: # for depthwise conv layers
gamma = np.expand_dims( gamma, axis )
beta = np.expand_dims( beta, axis )
mean = np.expand_dims( mean, axis )
variance = np.expand_dims( variance, axis )
weights[0] = amp_factor*gamma/np.sqrt(variance+epsilon)*weights[0]
weights[1] = amp_factor*(gamma/np.sqrt(variance+epsilon)
*(weights[1]-mean)+beta)
return weights
How does it seem to you?
I see. There is a mismatch between activation channel and weight channel. The activation channel determines the parameter in BatchNormalization layer which causes the shape mismatch.
In my opinion, the single-channel weight in DepthwiseConv2D should be broadcasted to fit the channel size in BatchNormalization, which means you will remove the BatchNormalization by storing more weights rather than single channel for the convolutional layer. It is possible to modify the SpKeras to do this, but it is a bit expensive. You can try using Conv2D with customized channel size to replace DepthwiseConv2D.
I wish I understood the question correctly.
These are the concerning layers and their weight shapes in order:
<tensorflow.python.keras.layers.advanced_activations.ReLU object at 0x7f21ed924dd0> - [] (no weights here)
<tensorflow.python.keras.layers.convolutional.DepthwiseConv2D object at 0x7f21ed88f450> - TensorShape([3, 3, 32, 1])
<tensorflow.python.keras.layers.normalization_v2.BatchNormalization object at 0x7f21ed88fe10> - TensorShape([32])
It got stuck at the DepthwiseConv2D point when getting the error.
(3, 3, 32, 32)
was the result of func. batchnormalization before my update.
The ReLU above is a layer itself and the overall model does not use bias. I included the activations which come as layers as well by modifying the func. findlambda, as below.
if ... or layer_type == 'ReLU':
I work with many types of CNNs, some do not use bias, no BatchNormalization etc. I intend to keep DepthwiseConv2D in that regard.
So, my short-fix above didn't do the job, did I get it right?
So, my short-fix above didn't do the job, did I get it right?
In my opinion, you did not get the correct weight.
some do not use bias, no BatchNormalization
This is the easiest way to solve the problem by removing bias and BatchNormalization (BN). SpKeras can recognize the layer without bias. Besides, you will not suffer from the mismatch between weight in DepthwiseConv2D and parameter in BN.
if ... or layer_type == 'ReLU':
If I include stand-alone activation layers with this way to findlambda, would that be the right way?
I've tried no bias-no BN version of the model and it works. However, the accuracy remains the same as my pre-edited (wrong weighted) + BN version of the same model. What is the difference between the two versions then indeed?
If I include stand-alone activation layers with this way to findlambda, would that be the right way?
I think there is no need doing this, spkeras does find the correct maximum value. The problem is the mismatch between the shape size of parameters in DepthwiseConv2D and BN.
What is the difference between the two versions then indeed?
I am not sure the exact reason for the same accuracy, since I did not check the details, e.g. the SNN weights after conversion, for the pre-edited version. It might be figured out by checking the weights before and after passing through func: batchnormalization, e.g. the shape and the parameters from BN --- if they are similar even from different channels.
Alright, I see. I've been updating the code to fix some bugs (the original spkeras was crashing directly in some model cases) and will be publishing them soon. I'll send a pull request to you to let you know in case you are interested to make it more robust. Thank you for the answers Dengyu.
That would be very helpful. Good luck with your publication.
I am also working on SpKeras 2.0, which supports more architectures. It is coming soon. :)
Can you please let me know if spkeras works on all or only some of the conv layer types in the link @Dengyu-Wu? I'm having a bit of a problem with a DepthwiseConv2D layer, hence am asking.