TACJu / TransFG

This is the official PyTorch implementation of the paper "TransFG: A Transformer Architecture for Fine-grained Recognition" (Ju He, Jie-Neng Chen, Shuai Liu, Adam Kortylewski, Cheng Yang, Yutong Bai, Changhu Wang, Alan Yuille).
MIT License
382 stars 88 forks source link

About CUB-200-2011's accuracy #2

Open hyqyoung opened 3 years ago

hyqyoung commented 3 years ago

Thanks for your work and sharing your codes! However, when I reproduce your code on 4 Tesla GPU V-100 entirely following the instruction with non-overleap, I just got 90.8% accuracy. Could you analyze the problem about this?

haoweiz23 commented 3 years ago

I reproduce TransFG code follow the instruction with overleap on 4 Tesla GPU V-100 entirely. I just got 91.2%(Paper: 91.7%) Besides, I add AutoAug in training stage as paper mentioned, I got a lower, 90.8% accuracy. Is there any details I missed?

hyqyoung commented 3 years ago

I reproduce TransFG code follow the instruction with overleap on 4 Tesla GPU V-100 entirely. I just got 91.2%(Paper: 91.7%) Besides, I add AutoAug in training stage as paper mentioned, I got a lower, 90.8% accuracy. Is there any details I missed?

I reproduce the code and got the accuracy 90.8% with non-overleap, thus your result 91.2% is reasonable according to the paper that overleap is better than non-overleap. As for the AutoAug, according to the code of CUB data utils, there is no AutoAug. By the way, have you got any good results on Stanfor Car, Stanfor Dogs?

haoweiz23 commented 3 years ago

have you got any good results on Stanfor Car, Stanfor Dogs?

I have not try other datasets yet. There is no AutoAug in official code indeed, however, Auto Aug is mentioned in TransFG paper training details.

By the way, have you ever reproduce ViT on CUB using this training code?

hyqyoung commented 3 years ago

I have not reproduce VIT on CUB using TransFG, but I think the accuracy will a bit lower than 90.8% and 91.2%

DennisLeoUTS commented 3 years ago

Similarly, I can't reproduce 91.7% on CUB-200-2011. Only got 91.0% and 90.8% with/without overlapping. 91.7% to 91.0% is a big degrade with this dataset. I've checked the paper and the code very carefully and could not figure out the reason.

jokerpwn commented 3 years ago

have you got any good results on Stanfor Car, Stanfor Dogs?

I have not try other datasets yet. There is no AutoAug in official code indeed, however, Auto Aug is mentioned in TransFG paper training details.

By the way, have you ever reproduce ViT on CUB using this training code?

I reproduce ViT on CUB using the reference code:https://github.com/jeonsworld/ViT-pytorch, and got the accuracy 90.7%.

TACJu commented 3 years ago

Hi guys, sorry for the late reply. I'm very busy with the ongoing competitions and other projects so I don't have much time to maintain this repo at this time. I checked the commit log and found that the problem is that when I cleaned my code the scale for contrastive loss is wrong and the model needs to load the pre-trained layer norm weights instead of learning from scratch. I've fixed the bugs and tested that for me I can now get around 91.6%-91.7% accuracy on CUB. Also, since I did not carefully search for the best hyper-parameters and just try the four recommended settings as in the original ViT paper so maybe you can have a try and share here if you find something better. Thanks!

TACJu commented 3 years ago

have you got any good results on Stanfor Car, Stanfor Dogs?

I have not try other datasets yet. There is no AutoAug in official code indeed, however, Auto Aug is mentioned in TransFG paper training details. By the way, have you ever reproduce ViT on CUB using this training code?

I reproduce ViT on CUB using the reference code:https://github.com/jeonsworld/ViT-pytorch, and got the accuracy 90.7%.

Hi, @jokerpwn. Could you please share the setting you used to get 90.7% for ViT? I've tried the four recommended settings in the original paper and the best I can get is 90.3%. If so, I'll update the paper. Thanks!

jokerpwn commented 3 years ago

have you got any good results on Stanfor Car, Stanfor Dogs?

I have not try other datasets yet. There is no AutoAug in official code indeed, however, Auto Aug is mentioned in TransFG paper training details. By the way, have you ever reproduce ViT on CUB using this training code?

I reproduce ViT on CUB using the reference code:https://github.com/jeonsworld/ViT-pytorch, and got the accuracy 90.7%.

Hi, @jokerpwn. Could you please share the setting you used to get 90.7% for ViT? I've tried the four recommended settings in the original paper and the best I can get is 90.3%. If so, I'll update the paper. Thanks!

I guess it's because I used only one GPU, and the other settings are no different.

narrowsnap commented 3 years ago

I reproduce ViT on CUB using the reference https://github.com/rwightman/pytorch-image-models, and got an accuracy of 91.06%(with vit_base_patch16_384(pretrained=True, num_classes=200, drop_rate=0.1), optim.SGD(model.parameters(), lr=0.0001, momentum=0.9, weight_decay=1e-5), lr_scheduler.ReduceLROnPlateau(optimizer, 'max'), run 100 epoch)

Christine620 commented 3 years ago

have you got any good results on Stanfor Car, Stanfor Dogs?

I have not try other datasets yet. There is no AutoAug in official code indeed, however, Auto Aug is mentioned in TransFG paper training details. By the way, have you ever reproduce ViT on CUB using this training code?

I reproduce ViT on CUB using the reference code:https://github.com/jeonsworld/ViT-pytorch, and got the accuracy 90.7%.

Hi, @jokerpwn. Could you please share the setting you used to get 90.7% for ViT? I've tried the four recommended settings in the original paper and the best I can get is 90.3%. If so, I'll update the paper. Thanks!

I guess it's because I used only one GPU, and the other settings are no different.

Hi, @jokerpwn ,what is your setting to get 90.7% for ViT? what are four recommended settings on CUB? Is there any details I missed?

haoweiz23 commented 2 years ago

@TACJu After you fixed the norm layer pretrained and contrastive loss, I can still not reproduce TrasnFG 91.7%. Only get 91.2% / 91.0% with and without overlap on CUB. Could you please check the code again or provide the training log? If anyone have reproduced the correct results, please leave a message below.

20713 commented 2 years ago

Hi guys, sorry for the late reply. I'm very busy with the ongoing competitions and other projects so I don't have much time to maintain this repo at this time. I checked the commit log and found that the problem is that when I cleaned my code the scale for contrastive loss is wrong and the model needs to load the pre-trained layer norm weights instead of learning from scratch. I've fixed the bugs and tested that for me I can now get around 91.6%-91.7% accuracy on CUB. Also, since I did not carefully search for the best hyper-parameters and just try the four recommended settings as in the original ViT paper so maybe you can have a try and share here if you find something better. Thanks!

hi, Did you train the CUB-200-2011 dataset for 10000 epoch really?