Closed vijaykumar01 closed 5 years ago
@vijaykumar01 In the paper it wasn't entirely clear to me - yes, in the Procedure 1 they note the initialization to 0, but in paragraph 2 they write:
(...) initial logits bij are the log prior probabilities that capsule i should be coupled to capsule j.
The way I interpreted it is that the network can learn the good prior - initialization - which capsules should be capsuled with which. I thought it's more general and data-driven.
Later they released their tensorflow intiliazation and it seems that in the end initiali logits bij are indeed just set to 0 in their case. However, since making these values learnable parameters didn't break the training and (in my opinion) could potentially help, I kept them that way. It's a good remark though.
Hi, For every routing call, logits b_ij needs to set to 0 as per the paper. In the code, it is initialized only once at the beginning. Or. Am I missing something here?