cedrickchee / capsule-net-pytorch

[NO MAINTENANCE INTENDED] A PyTorch implementation of CapsNet architecture in the NIPS 2017 paper "Dynamic Routing Between Capsules".
Other
168 stars 50 forks source link

Test error results reported are actually loss figures and are not comparable with paper reported accuracies. #11

Open geefer opened 6 years ago

geefer commented 6 years ago

Thanks for a very nicely presented implementation of Capsule Networks. I especially appreciate the tensorboard plots.

Unfortunately I believe you have mixed up "test error" with "test loss" when reporting your best results and comparing with the results from the paper.

The paper shows a table of test classification accuracy (Table 1) and reports a best error of 0.25%. This will have been calculated as:

(number of incorrectly classified test images) / (total number of test images) * 100%

Thus since there are 10,000 test images this equates to 25 mis-classified images for 0.25% error.

This is equivalent to an accuracy of 99.75%

Unfortunately you list test accuracy and test error figures that do not sum to 100% because you are listing the test loss figure which is not a useful measure of the classification accuracy of the network.

Although I have not seen an independent implementation on the net that claims to achieve this 99.75% figure, I have seen several that achieve greater than 99.6% (my own implementation has achieved 99.68% in 50 epochs). Since your best test accuracy is 99.32% it is possible that you have some error in your implementation as this is quite a way from the 99.75% achieved by the authors of the paper.

cedrickchee commented 6 years ago

Hi,

Apologize for the delayed response. Was really tied up with my studies. Nevertheless, a late reply is better than no reply. :slightly_smiling_face:

very nicely presented implementation of Capsule Networks. I especially appreciate the tensorboard plots.

Thank you.

Unfortunately I believe you have mixed up "test error" with "test loss" when reporting your best results and comparing with the results from the paper.

The paper shows a table of test classification accuracy (Table 1) and reports a best error of 0.25%.

I am :confused: See my analysis of the paper below:

screenshot_capsnet_paper

Just to be very clear, my understanding of test error is, usually it's use interchangeably with test loss or validation/test error. What this means is, test error is the same as test loss.

Now, I am not sure is it me or the paper got mixed up test error and test accuracy.

Please correct me if I am wrong. BTW, my background is not in research and the project goal is for educational purposes. I am not trying to be rigorous when replicating the results. So, the 99.XX% accuracy is not a big deal for this project.

geefer commented 6 years ago

Hi,

Thanks for explaining your thinking. However I do not think the test error reported in the paper and test loss are the same thing at all (though I do understand why when looking at the loss values in the context of SGD some people refer to it measuring the error - but in that case it is a somewhat imprecise use of the term).

I believe that the test error reported by the paper is a measure of the fraction of incorrectly classified images (and thus can be sensibly represented as a percentage). That is why in my way of thinking accuracy and error are essentially two ways of looking at the same thing (accuracy being the fraction of correctly classified images)

The test loss that you are referring to (and which is shown on your tensorboard plots) is the result of applying the loss function to the output of the network - and is minimised using the back propagation process. The loss function is somewhat arbitrary and different loss functions could be chosen (they may or may not work well). For instance, if the reconstruction loss downscaling factor (set at 0.0005 in the paper) was changed, then the absolute loss value would change. Note that the loss is a number and not a fraction - and so it would not make sense to list it in the paper's performance table as a percentage.

It does not make sense that a paper would provide performance results of a classification network as a loss value as these do not actually give any absolute information about how well the network performs. Neither are the loss results comparable with results from other people's networks on the same dataset, as their loss functions could be utterly different. For instance, in the authors' paper "Matrix Capsules with EM Routing" they use a completely different loss function but still present Test Error results which should be able to be directly compared with the results in the "Dynamic routing Between Capsules" paper and with other networks built by other researchers - as the error measurement is an absolute measure of how well the network performs on the classification task.

I hope this clarifies my thinking better for you. I am no expert in this field but I believe my explanation seems sensible.

mightydeveloper commented 5 years ago

I agree with @geefer Error rate should be simply (1 - accuracy) and (test error) != (test loss) The following sentence in README.md was very confusing for me.

The current test error is 0.21% and the best test error is 0.20%. The current test accuracy is 99.31% and the best test accuracy is 99.32%.

I suggest reporting the best error rate for this implementation is 0.68% while the paper is 0.25%