Open jpviguerasguillen opened 5 years ago
I managed to solve it by changing the code. I used matmul instead of batch_dot. For that, inputs_tiled (without the batch dimension) needed to have the same rank as W. Note that I use tensorflow library directly:
inputs_expand = tf.expand_dims(inputs, 1)
inputs_tiled = tf.tile(inputs_expand, [1, self.num_capsule, 1, 1])
inputs_tiled = tf.expand_dims(inputs_tiled, 4)
inputs_hat = tf.map_fn(lambda x: tf.matmul(self.W, x), elems=inputs_tiled)
Did you also have to update this line :
b += K.batch_dot(outputs, inputs_hat, [2, 3])
to this b += tf.matmul(self.W, x)
i did this mindlessly because i was the same error at that line.
Is this the right way to correct it ?
Yes, I also changed that. The code changed substantially with respect to the original one. Here it is the call function of CapsuleLayer (note that I use tensorflow API directly)
import tensorflow as tf # Using tensorflow 2.0.0
from tensorflow.keras import layers, initializers
from tensorflow.keras import backend as K
# ...
def call(self, inputs, training=None):
# Expand the input in axis=1, tile in that axis to num_capsule, and
# expands another axis at the end to prepare the multiplication with W.
# inputs.shape=[None, input_num_capsule, input_dim_capsule]
# inputs_expand.shape=[None, 1, input_num_capsule, input_dim_capsule]
# inputs_tiled.shape=[None, num_capsule, input_num_capsule,
# input_dim_capsule, 1]
inputs_expand = tf.expand_dims(inputs, 1)
inputs_tiled = tf.tile(inputs_expand, [1, self.num_capsule, 1, 1])
inputs_tiled = tf.expand_dims(inputs_tiled, 4)
# Compute `W * inputs` by scanning inputs_tiled on dimension 0 (map_fn).
# - Use matmul (without transposing any element). Note the order!
# Thus:
# x.shape=[num_capsule, input_num_capsule, input_dim_capsule, 1]
# W.shape=[num_capsule, input_num_capsule, dim_capsule,input_dim_capsule]
# Regard the first two dimensions as `batch` dimension,
# then matmul: [dim_capsule, input_dim_capsule] x [input_dim_capsule, 1]->
# [dim_capsule, 1].
# inputs_hat.shape=[None, num_capsule, input_num_capsule, dim_capsule, 1]
inputs_hat = tf.map_fn(lambda x: tf.matmul(self.W, x), elems=inputs_tiled)
# Begin: Routing algorithm ----------------------------------------------#
# The prior for coupling coefficient, initialized as zeros.
# b.shape = [None, self.num_capsule, self.input_num_capsule, 1, 1].
b = tf.zeros(shape=[tf.shape(inputs_hat)[0], self.num_capsule,
self.input_num_capsule, 1, 1])
assert self.routings > 0, 'The routings should be > 0.'
for i in range(self.routings):
# Apply softmax to the axis with `num_capsule`
# c.shape=[batch_size, num_capsule, input_num_capsule, 1, 1]
c = layers.Softmax(axis=1)(b)
# Compute the weighted sum of all the predicted output vectors.
# c.shape = [batch_size, num_capsule, input_num_capsule, 1, 1]
# inputs_hat.shape=[None, num_capsule, input_num_capsule,dim_capsule,1]
# The function `multiply` will broadcast axis=3 in c to dim_capsule.
# outputs.shape=[None, num_capsule, input_num_capsule, dim_capsule, 1]
# Then sum along the input_num_capsule
# outputs.shape=[None, num_capsule, 1, dim_capsule, 1]
# Then apply squash along the dim_capsule
outputs = tf.multiply(c, inputs_hat)
outputs = tf.reduce_sum(outputs, axis=2, keepdims=True)
outputs = squash(outputs, axis=-2) # [None, 10, 1, 16, 1]
if i < self.routings - 1:
# Update the prior b.
# outputs.shape = [None, num_capsule, 1, dim_capsule, 1]
# inputs_hat.shape=[None,num_capsule,input_num_capsule,dim_capsule,1]
# Multiply the outputs with the weighted_inputs (inputs_hat) and add
# it to the prior b.
outputs_tiled = tf.tile(outputs, [1, 1, self.input_num_capsule, 1, 1])
agreement = tf.matmul(inputs_hat, outputs_tiled, transpose_a=True)
b = tf.add(b, agreement)
# End: Routing algorithm ------------------------------------------------#
# Squeeze the outputs to remove useless axis:
# From --> outputs.shape=[None, num_capsule, 1, dim_capsule, 1]
# To --> outputs.shape=[None, num_capsule, dim_capsule]
outputs = tf.squeeze(outputs, [2, 4])
return outputs
Thanks @jpviguerasguillen , that is a great solutions that has got this code working for tf2.0!
Thank you very much..!!!!This works !!!! Solution works on ft 2.3.0
what is num_capsule?
what is num_capsule?
That is the number of capsules in the current layer.
UPDATE: While the changes I indicated above works well, I later realized that this implementation of CapsNets has a "big issue": it is not implemented as the original authors designed it.
Sabour et al.'s paper ('Dynamic Routing between Capsules') said (page 4): "In total PrimaryCapsules has [32x6x6] capsule outputs (each output is an 8D vector) and each capsule in the [6x6] grid is sharing their weights with each other." However, this seemed a contradiction to the caption in their Figure 1 saying: "_W_ij is a weight matrix between each u_i; i being (1; 32x6x6) in PrimaryCapsules and vj, j being (1; 10)."
However, I believe that they intented as the first quote says. This is, the PrimaryCaps had initially a size 256x6x6, which then is interpreted as 32 capsules of 8 elements in a grid of 6x6, where all the (let's name it as) 'subcapsules' in the grid 6x6 are simply the evaluation of the capsule at the different spatial points. This simply means what they said in the beginning: the weight W_ij is shared among the capsules in the 6x6 grid.
The main issue here is the terminology: they are using the same term, capsules, to two different concepts. The "real capsule" would be the entity to find (let's say, a horizontal line, or a circle, and in their case they have 32 entities), whereas the "instanciation/output of the capsule" is the resulting "vector" at the different positions telling whether such entity exits or not. And, for that, we need to use the same weights W to all vectors in the 6x6 grid from the same "capsule".
What does the code above do? It does NOT share weights, therefore the entities at the different spatial points could be completely different. The implementation above is NOT wrong per se, it simply does not consider the concept of looking for the same entity at different spatial points.
There is another missing thing, which I noticed in Sabour's github: they add a bias term. In their case, their bias term in the DigitCaps would be of size 16x10.
b = tf.zeros(shape=[tf.shape(inputs_hat)[0], self.num_capsule, self.input_num_capsule, 1, 1]) some wrong happened. NotImplementedError: Cannot convert a symbolic Tensor (digitcaps/strided_slice:0) to a numpy array.
UPDATE: While the changes I indicated above works well, I later realized that this implementation of CapsNets has a "big issue": it is not implemented as the original authors designed it.
Sabour et al.'s paper ('Dynamic Routing between Capsules') said (page 4): "In total PrimaryCapsules has [32x6x6] capsule outputs (each output is an 8D vector) and each capsule in the [6x6] grid is sharing their weights with each other." However, this seemed a contradiction to the caption in their Figure 1 saying: "_W_ij is a weight matrix between each u_i; i being (1; 32x6x6) in PrimaryCapsules and vj, j being (1; 10)."
I think there does NOT exist contradiction, only a misuse of word "weights" in the sentence "each capsule in the [6x6] grid is sharing their weights with each other_.". The word "weights" here means the convolution kernel weights, not the weight matrix W_ij in equation (2) (page 2). They never use "weights" to refer to W_ij, instead they use "weight matrix" in the paper. The author also use "weights" to refer to the convolution kernel weight in paragraph 3 in page 2.
The toal paragraph in page 3 and page 4 is as follows: "The second layer (PrimaryCapsules) is a convolutional capsule layer with 32 channels of convolutional 8D capsules (i.e. each primary capsule contains 8 convolutional units with a 9 9 kernel and a stride of 2). Each primary capsule output sees the outputs of all 25681 Conv1 units whose receptive fields overlap with the location of the center of the capsule. In total PrimaryCapsules has [32; 6; 6] capsule outputs (each output is an 8D vector) and each capsule in the [6; 6] grid is sharing their weights with each other. One can see PrimaryCapsules as a Convolution layer with Eq. 1 as its block non-linearity. The final Layer (DigitCaps) has one 16D capsule per digit class and each of these capsules receives input from all the capsules in the layer below."
In this paragraph, the author just only talk about convolution.
@jpviguerasguillen YOU SAVED MY LIFE
Hey guys, Tried the above but get an as following:
TypeError: ('Keyword argument not understood:', 'share_weights')
I have updated the code for tensorflow 2.x with integrated Keras. It is supposed it should all work the same. I am running it in Google Colab. I have the following problem:
In CapsuleLayer, once the input x is expanded and tiled, it is multiplied with the weight matrix W. I have added some print(x.shape) and I get:
However, it is expected: inputs_hat.shape = [None, 10, 1152, 16]
Subsequently, I get this error for the next batch_dot (but this is expected, as inputs_hat is already wrong:
I have tried to expand x with a last dimention, i.e. to (None, 10, 1152, 8, 1), but surprisingly this gives:
I don't understand why 1152 is replicated in this matmul! This matrix multiplication should be easy!