Closed sairaamVenkatraman closed 5 years ago
Hi,
Thanks for the questions! We have used the original implementation from Sara Sabour (Official Code) as a reference for our implementation.
I think you misunderstood the concept behind the b variable. If b is independent on the image, then all the images would be routed to the same capsule in the same layer. Therefore we compute the routing coefficient for each element in the batch. This is consistent with the official code, if you look at line 186 in layers/layers.py. They call the 'b' variable votes.
The bias term added before the squash function is not mentioned in the paper, but in their official code they are actually adding a bias before the squash. line 113 in layers/layers.py
@ethanleet can answer on the results on the SmallNorb dataset
@hukkelas Thanks for answering the questions. For the baseline model (the one we adopted from Sabour et al.), SmallNorb achieved 91.5% accuracy.
That does seem to be significantly lower than the reported results on SMALLNORB. But I guess a lot of things are unclear with capsnet. Thanks for answering! Do you have any idea why the lower accuracy?
We replicated the authors methods as authentically as possible, but there are still many factors potentially contributing to the discrepancy. One of them is initialization, which is not explicitly specified in their paper. Hyperparameters like learning rate also matter, as well as for how long the model has been trained. After all, we didn't spend too much time on experimenting SmallNorb. Most focuses were on making sure that MNIST works, and trying to improve CIFAR10 (the results of which are not published here yet).
Fair enough. Thanks for making your code available. I guess I'll go to the official implementation and run it on smallNORB.
Hi. I've found some difference between what you've implemented and the original model in the paper. Apart from the reconstruction network being different (which I think is a cool experiment), I've noticed that the 'b' (in dynamic routing) you use is implemented differently. Originally, the paper seems to suggest using a single 'b' that is independent of the image. You have used one 'b' for the entire minibatch. Also, the use of a bias term added to 'c' isn't present in the original paper. Was this part of an experiment? I'm still going through the code and haven't tested it beyond MNIST, but since yours one of few repositories that test on SMALLNORB, can I ask what results you've obtained on it. Thanks!