facebookresearch / open_lth

A repository in preparation for open-sourcing lottery ticket hypothesis code.
MIT License
622 stars 113 forks source link

Possible 1-image difference in test accuracy between logger file and manual calculation #8

Open bbartoldson opened 3 years ago

bbartoldson commented 3 years ago

Hi Jonathan,

Thank you for sharing this awesome code!

I loaded 8 models from checkpoint.pth files created by lottery experiments, tested them on CIFAR10, and compared those test accuracies to the accuracies in the logger files. For 3/8 models, the test accuracies were different by 0.0001 (e.g., 78.53% vs. 78.54%). The logger accuracy could be higher or lower than my manually calculated accuracy; i.e., neither accuracy source was systematically higher.

If you know a potential reason for this, please let me know, and I will investigate. For example, I was thinking something could be happening due to a float() or str() conversion in the MetricLogger, but I don't think that is it. Also, maybe I'm not instantiating the PrunedModel properly when I do so manually using the checkpoint.pth files (if that's the case, though, then I'm not sure why the accuracies are exactly equal for 5/8 models and very close for 3/8). In case it's helpful, these 8 models come from 8 levels of pruning during a lottery experiment (so they have the same architecture).

