M-Nauta / ProtoTree

ProtoTrees: Neural Prototype Trees for Interpretable Fine-grained Image Recognition, published at CVPR2021
MIT License
87 stars 17 forks source link

Accuracy of non-iNat Networks #6

Closed EoinKenny closed 2 years ago

EoinKenny commented 2 years ago

Hi,

I was able to recreate the accuracy of ResNet50 pre-trained on iNat using the suggestions I found in the closed issues (i.e., around 82%), but when I substitute the network for another (say VGG-16) I get an accuracy of around 11%. Or ResNet50 pre-trained on ImageNet gets around 62%, and ResNet18 gets around 30%.

I'm just wondering if that's normal? Or are the other hyperparameters which need to be changed to boost accuracy on other networks? Thank you if you have time to offer suggestions.

So e.g., I may use this

python main_tree.py --epochs 150 --log_dir ./runs/protoree_cub --dataset CUB-200-2011 --lr 0.001 --lr_block 0.001 --lr_net 1e-5 --num_features 256 --depth 9 --net vgg16 --freeze_epochs 30 --milestones 60,80,100,120,140

M-Nauta commented 2 years ago

These results can simply be explained by the fact that freezing and unfreezing was only implemented for ResNet50 and not for other architectures (my bad). That’s why the other backbones gave such a bad accuracy. The non-pretrained weights of the model (1x1 conv layer and prototypes) need to settle in a bit before finetuning the backbone. I have updated the code such that every network first trains the add-on layer and prototypes for a few epochs, before unfreezing and training the whole model. Secondly, since the backbone is pretrained on a task which is further away from the target task (not iNat but ImageNet for bird recognition), it makes sense to increase the learning rates of the backbone. I didn’t do any hyperparameter tuning, but something like python3 main_tree.py --epochs 150 --log_dir ./runs/protoree_cub --dataset CUB-200-2011 --lr 0.001 --lr_block 0.002 --lr_net 1e-4 --num_features 256 --depth 9 --net resnet50 --freeze_epochs 2 --milestones 60,80,100,120,140 should work much better than reported before (I got 71% for ResNet50 and 54% for VGG16). The results will probably not match the iNat-initialization, since CUB-200-2011 is a very small dataset and therefore susceptible to good weight initializations which wasn’t the core focus for now.

EoinKenny commented 2 years ago

Thanks a lot M-Nauta!