The results are different from the paper on miniImagenet dataset

usamieiji commented 5 years ago

Sorry to bother you but I have trained the 5-way 1-shot model (Conv-64F) from scratch and found the avearge test accuracy is 46.63±0.80% which is lower than the results provided in the paper. I didn't modify the code. The dataset and the split csv files I used are provided by RelationNet (https://github.com/floodsung/LearningToCompare_FSL). Is there something wrong in the dataet processing or any other aspects? Thanks for your attention.

WenbinLee commented 5 years ago

Thanks for your question. This is a little strange, I will repeat this experiment by using the dataset provided by RelationNet, and reply to you as soon as possible. BTW, what's your result of the 5-way 5-shot setting?

usamieiji commented 5 years ago

Thansk for your reply and helping me check the results. The training period takes about a day long on RTX 2080 Ti so I haven't trained 5-way 5-shot model yet. But I will train this model soon and check the results.

usamieiji commented 5 years ago

I have tested the 5-way 5-shot seting and found that it is still lower than the results provided in the paper. Then I tried to compare the difference in data processing. In RelationNet, images are pre-processed by: im = im.resize((84, 84), resample=Image.LANCZOS) In DN4, images are processed by: transforms.Resize((opt.imageSize, opt.imageSize)), where opt.imageSize is 84. These two different processing methods could cause at most 255 error in the same pixel. Maybe it is the reason why my results are so bad.

WenbinLee commented 5 years ago

Yes, it may be a reason, but I am not sure. Actually, I have repeated the 5-way 1-shot experiment by using the dataset and splits adopted by RelationNet, and the results are all right. I don't know the reason causing the difference, because I can repeat the results.

I have uploaded the splits of miniImageNet into the dataset file. I am wondering if you can directly use our code and use the datasets_csv.py to read the .csv files along with the downloaded 'images' of miniImageNet? Also, it will be better if you can send your codes to my email (liwenbin.nju@gmail.com), I can help to check the codes.

Thank you very much.

usamieiji commented 5 years ago

Sorry to bother you so much. Now I am using the dataset and csv files you provided to train the model. And I am still trying to do different experiments to figure out the influence caused by different processing methods.

Thanks for your help and I would try to figure it out instead of wasting your time in code-reviewing😂

WenbinLee commented 5 years ago

It's all right. Yeah, It is indeed important to find out the true reason and influence of pre-processing methods. Good luck.

You are welcome.

usamieiji commented 5 years ago

After several experiments, I found that different data processing methods really caused different results.

Using data processing (processed before training with https://github.com/floodsung/LearningToCompare_FSL/blob/master/datas/miniImagenet/proc_images.py) in RelationNet, DN4 shows 47.83±0.80% test average accuracy (5-way 1-shot). Using original data processing method (resized to 84 using bi-linear while training), DN4 shows 51.17±0.79% test average accuracy (5-way 1-shot).

Although I figure out the reason why bad results showed up, I am still lost in thinking why two different data processing methods could cause performance degradation by 4%. Deep learning is so "deep".

Anyway, thank your for your help.

WenbinLee commented 5 years ago

I am glad you found the reason and thanks for sharing the information with me.

It's very interesting (weird). However, like you, I also don't know the real reason for this phenomenon. Lol, maybe you are right. Deep learning is too "deep". When I have time, I will try to check this part. There is still a long way to make deep learning explainable...

Thank you.

d33dler commented 1 year ago

After several experiments, I found that different data processing methods really caused different results.

Using data processing (processed before training with https://github.com/floodsung/LearningToCompare_FSL/blob/master/datas/miniImagenet/proc_images.py) in RelationNet, DN4 shows 47.83±0.80% test average accuracy (5-way 1-shot). Using original data processing method (resized to 84 using bi-linear while training), DN4 shows 51.17±0.79% test average accuracy (5-way 1-shot).

Although I figure out the reason why bad results showed up, I am still lost in thinking why two different data processing methods could cause performance degradation by 4%. Deep learning is so "deep".

Anyway, thank your for your help.

@WenbinLee I'm currently experiencing the same issue , however in my case I didn't even change the pre-processing methods of the inputs, i'm using the same methods that you are using and the only difference is the torch version (but that shouldn't be an issue? since the default interpolation mode used is BILINEAR , just as usamieiji reports.) Also, in my case - I integrated your code in my framework, rewriting some parts but functionally everything is exactly the same - but again, I verified the correctness of it, and it should be fine. So for example at the end of epoch 3 I see Prec@1 39.732 for testing set whereas in your case it's already at ~46.0 , so ~5% difference just as in @usamieiji 's case

Another point: In the opt-results of the K3 W5 S1 imageNet I see you have (indefense) Convaindefence as (what I presume) criterion? I believe this is still CrossEntropyLoss but am not sure?

d33dler commented 1 year ago

https://github.com/WenbinLee/DN4/issues/9 in this comment you reported that network initialization vary between torch versions. could you elaborate? I'm currently looking through changelogs to understand what happened since I really don't want to downgrade to 0.4 since it will require downgrading CUDA and all of it is unmotivated hassle. @WenbinLee

WenbinLee commented 1 year ago

Hi Radu,

This issue is indeed strange. In general, the torch versions are not the key point. I will recommend you to use our latest repository " https://github.com/RL-VIG/LibFewShot.git", where we also reimplement DN4 and the Pytorch version is larger than 1.5.0.

I hope this helps.

Best, Wenbin

Wenbin Li, PhD.

Assistant Professor Department of Computer Science and Technology State Key Laboratory for Novel Software Technology Nanjing University, China

Email: @.; @. @.***> Homepage: https://cs.nju.edu.cn/liwenbin/

On Fri, May 19, 2023 at 3:39 AM Radu Rebeja @.***> wrote:

9 https://github.com/WenbinLee/DN4/issues/9 in this comment you

reported that network initialization vary between torch versions. could you elaborate? I'm currently looking through changelogs to understand what happened since I really don't want to downgrade to 0.4 since it will require downgrading CUDA and all of it is unmotivated hassle. @WenbinLee https://github.com/WenbinLee

— Reply to this email directly, view it on GitHub https://github.com/WenbinLee/DN4/issues/11#issuecomment-1553554818, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3GGHAOG4N6DPSAUWFOXGTXGZ3HFANCNFSM4IO43BBQ . You are receiving this because you were mentioned.Message ID: @.***>

d33dler commented 1 year ago

Hey @WenbinLee ,

Thanks, but unless you used some other techniques in the lib - I won't require going through that as my domain of study now is around smart augmentation pipelines/profiles for improving baseline results (thesis) . I lowered the learning rate to 0.001 and changed the decay and it fixed the issue (albeit convergence is a bit slower now, it works). I don't know what the reason for this is, since even when using directly your script (and only changing .view() calls to .reshape() for compatibility with the latest pytorch version) the same phenomenon was happening - ~5% accuracy lag .

WenbinLee commented 1 year ago

Hi,

We do not use other specific techniques in the LibFewShot. You mean if you use a lower learning rate of 0.001, it works? If so, I guess you can also try a higher learning rate, such as 0.01, to speed the convergence.

I will double check our code to find out the issues.

Wenbin Li, PhD.

Assistant Professor Department of Computer Science and Technology State Key Laboratory for Novel Software Technology Nanjing University, China

Email: @.; @. @.***> Homepage: https://cs.nju.edu.cn/liwenbin/

On Mon, May 22, 2023 at 10:20 PM Radu Rebeja @.***> wrote:

Hey @WenbinLee https://github.com/WenbinLee ,

Thanks, but unless you used some other techniques in the lib - I won't require going through that as my domain of study now is around smart augmentation pipelines/profiles for improving baseline results (thesis) . I lowered the learning rate to 0.001 and changed the decay and it fixed the issue (albeit convergence is a bit slower now, it works). I don't know what the reason for this is, since even when using directly your script (and only changing .view() calls to .reshape() for compatibility with the latest pytorch version) the same phenomenon was happening - ~5% accuracy lag .

— Reply to this email directly, view it on GitHub https://github.com/WenbinLee/DN4/issues/11#issuecomment-1557309366, or unsubscribe https://github.com/notifications/unsubscribe-auth/AE3GGHFQX2IT6UDHJQAXME3XHNYZDANCNFSM4IO43BBQ . You are receiving this because you were mentioned.Message ID: @.***>

d33dler commented 1 year ago

@WenbinLee wouldn't a higher learning rate such as 0.01 just cause an overshoot and make it easier to overfit early on in training ?

d33dler commented 1 year ago

@WenbinLee I identified the issue, it was your reported results in paper for miniImageNet that were wrong (I believe you put results from another project of yours - CovaMNet). Since i found in LibFewShot the correct results that align perfectly with the results I reproduced. https://github.com/RL-VIG/LibFewShot/tree/main/reproduce . Be careful next time, since your mistakes cost other researchers their time.

WenbinLee commented 1 year ago

Hi,

I am wondering could you please show me your reproduced results?

Could you please tell me in detail about the assessment ''Since i found in LibFewShot the correct results that align perfectly with the results I reproduced''? In fact, I can not find the essential difference between the reproduced results of DN4 in our LibFewShot and the reported results in our original DN4 paper.

I am sure I do NOT use wrong results from another project.

Thanks

d33dler commented 1 year ago

My bad I thought your DN4 is the Baseline, since i coulnd't find the DN4 path in the repository. But DN4 is missing all together from LibFewShot, so yes, I still think there was some error in result production for miniImageNet (stanforddogs is fine). Simply try to use your own code from this repository and a newer torch version and show us the logs. I could show mine, but I don't have them currently.

WenbinLee commented 1 year ago

Hi,

Recently, I have tried to directly run the code from our repository using PyTorch-1.7, and find that if we use an Adam optimizer, we need to use a much smaller learning rate, such as 0.001. That is, under PyTorch1.7, using a learning rate of 0.001, I can obtain an accuracy of 51.18% on the 5-way 1-shot setting on miniImageNet. I guess this is the issue you met.

To address this issue once and for all, I have reimplemented DN4 using the latest PyTorch version. Please kindly using our latest codes.

Thanks.

The following is the test results of running the original DN4 repository under PyTorch-1.7:

Namespace(basemodel='Conv64F', beta1=0.5, clamp_lower=-0.01, clamp_upper=0.01, cuda=True, data_name='miniImageNet', dataset_dir='/data1/Liwenbin/Datasets/miniImageNet--ravi', episodeSize=1, episode_test_num=600, episode_train_num=10000, episode_val_num=1000, epochs=30, imageSize=84, lr=0.005, mode='test', nc=3, neighbor_k=3, ngpu=1, outf='./results/DN4_Lr0.001_miniImageNet_Conv64F_5Way_1Shot_K3', print_freq=100, query_num=15, resume='./results/DN4_Lr0.001_miniImageNet_Conv64F_5Way_1Shot_K3/model_best.pth.tar', shot_num=1, testepisodeSize=1, way_num=5, workers=8) => loaded checkpoint './results/DN4_Lr0.001_miniImageNet_Conv64F_5Way_1Shot_K3/model_best.pth.tar' (epoch 20) FourLayer_64F( (features): Sequential( (0): Conv2d(3, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (2): LeakyReLU(negative_slope=0.2, inplace=True) (3): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (4): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (5): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (6): LeakyReLU(negative_slope=0.2, inplace=True) (7): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False) (8): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (9): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (10): LeakyReLU(negative_slope=0.2, inplace=True) (11): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1), bias=False) (12): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True) (13): LeakyReLU(negative_slope=0.2, inplace=True) ) (imgtoclass): ImgtoClass_Metric() ) ===================================== Round 0 ===================================== Testset: 600-------------0 Test-(20): [100/600] Time 0.140 (0.138) Loss 0.719 (1.196) Prec@1 77.333 (51.525) Test-(20): [200/600] Time 0.069 (0.130) Loss 1.270 (1.202) Prec@1 42.667 (51.556) Test-(20): [300/600] Time 0.145 (0.126) Loss 1.482 (1.206) Prec@1 38.667 (51.601) Test-(20): [400/600] Time 0.117 (0.125) Loss 1.221 (1.208) Prec@1 57.333 (51.561) Test-(20): [500/600] Time 0.117 (0.125) Loss 1.013 (1.211) Prec@1 60.000 (51.404)

Prec@1 51.107 Best_prec1 49.653 Test accuracy 51.106667 h 0.8057469 ===================================== Round 1 ===================================== Testset: 600-------------1 Test-(20): [100/600] Time 0.175 (0.128) Loss 1.013 (1.214) Prec@1 62.667 (51.591) Test-(20): [200/600] Time 0.067 (0.124) Loss 1.400 (1.209) Prec@1 41.333 (51.602) Test-(20): [300/600] Time 0.199 (0.123) Loss 1.471 (1.215) Prec@1 41.333 (51.340) Test-(20): [400/600] Time 0.070 (0.122) Loss 1.162 (1.214) Prec@1 61.333 (51.638) Test-(20): [500/600] Time 0.087 (0.120) Loss 1.219 (1.213) Prec@1 60.000 (51.627)
Prec@1 51.396 Best_prec1 49.653 Test accuracy 51.395557 h 0.78009784 ===================================== Round 2 ===================================== Testset: 600-------------2 Test-(20): [100/600] Time 0.192 (0.129) Loss 1.553 (1.221) Prec@1 26.667 (50.812) Test-(20): [200/600] Time 0.077 (0.121) Loss 1.943 (1.221) Prec@1 33.333 (51.051) Test-(20): [300/600] Time 0.101 (0.120) Loss 1.181 (1.226) Prec@1 41.333 (50.915) Test-(20): [400/600] Time 0.084 (0.118) Loss 1.114 (1.220) Prec@1 58.667 (51.278) Test-(20): [500/600] Time 0.200 (0.117) Loss 0.872 (1.220) Prec@1 66.667 (51.226)
Prec@1 51.140 Best_prec1 49.653 Test accuracy 51.14 h 0.7493192 ===================================== Round 3 ===================================== Testset: 600-------------3 Test-(20): [100/600] Time 0.120 (0.123) Loss 0.886 (1.225) Prec@1 62.667 (50.455) Test-(20): [200/600] Time 0.122 (0.122) Loss 0.996 (1.226) Prec@1 62.667 (50.965) Test-(20): [300/600] Time 0.176 (0.122) Loss 0.930 (1.215) Prec@1 68.000 (51.189) Test-(20): [400/600] Time 0.089 (0.120) Loss 1.266 (1.217) Prec@1 40.000 (51.066) Test-(20): [500/600] Time 0.186 (0.120) Loss 1.684 (1.227) Prec@1 40.000 (50.776)
Prec@1 50.898 Best_prec1 49.653 Test accuracy 50.89778 h 0.8165116 ===================================== Round 4 ===================================== Testset: 600-------------4 Test-(20): [100/600] Time 0.082 (0.135) Loss 1.073 (1.230) Prec@1 61.333 (49.875) Test-(20): [200/600] Time 0.080 (0.126) Loss 1.115 (1.227) Prec@1 50.667 (50.136) Test-(20): [300/600] Time 0.079 (0.122) Loss 1.258 (1.237) Prec@1 53.333 (49.891) Test-(20): [400/600] Time 0.068 (0.121) Loss 0.930 (1.221) Prec@1 66.667 (50.474) Test-(20): [500/600] Time 0.116 (0.119) Loss 1.127 (1.220) Prec@1 52.000 (50.677)
Prec@1 51.364 Best_prec1 49.653 Test accuracy 51.364445 h 0.8149176 Aver_accuracy: 51.18089 Aver_h 0.7933186292648315

d33dler commented 1 year ago

Good work @WenbinLee ! I will run a training on my end using Adam and LR=0.001 and see what it gives

WenbinLee commented 1 year ago

I have updated new implementation of DN4, so I will recommend you to use the new code. The results will be much better.

d33dler commented 1 year ago

I understand, but i'm already working on top of your work in my own project. Moreover, I can't use torch 1.7 since i'm using A100 gpus which have a higher compute capability than 1.7 allows. If indeed the issue is the way you describe (Adam optimizer & LR) then just changing that should fix the issue? Unless you changed more things without letting us know :)

WenbinLee commented 1 year ago

I think this new code is also suitable for other PyTorch versions you used. You may can try it whether if it can works. The 2019 version of DN4 is somewhat old and slow, so I have optimized the training speed and data augmentation. The core of DN4 is consistent. No worries.

d33dler commented 1 year ago

I will integrate parts of your code (since i'm mainly interested in the vanilla DN4). I ran DN4 miniImageNet W5 S1 with Adam and cosine scheduling, but cosine annealing had T_max=10 and min_eta=0.0001 and it got stuck in local minima at around epoch 11 with Accuracy@1 ~= 46% (so same score as in initial experiments). it converges on the training set much faster even with0.001 initial LR (around epoch 8-9) due to using Adam, i'll try training with setting CosineAnnealing parameters same as you did in the 2023 update and see what happens.

d33dler commented 1 year ago

But I see you changed multiple things, you now have a bias enabled in Conv2d layers, you introduce augmentation in validation with CenterCrop and you resize to 92 (for opt.imageSize==84). what is the reason for these updates? This makes it harder for us to improve on top of your work...

WenbinLee commented 1 year ago

This is because some latest FSL methods use these settings, which have been common setting in the field of few-shot learning. If you want to make a fair comparison with the latest FSL works, you can use these training settings (tricks). Also. if you don't need to compare with the latest FSL methods, you can use the old version code of DN4.

d33dler commented 1 year ago

Alright, I introduced all the changes that are featured in your work, including ImageToClass KNN with small adjustment to handle augmented views . Nice rewrite of the KNN , got a speedup of ~2x.

d33dler commented 1 year ago

When using Adam, the learning rate should be 0.0001 or lower to avoid getting stuck in local minima at recall@1=~46-47%

WenbinLee commented 1 year ago

In fact, you can use SGD and cosine learning rate schedule by referring to the 2023 version code of DN4. In this case, you can use a much larger learning rate such as 0.01 or 0.05 to avoid the local minima.

WenbinLee / DN4

The results are different from the paper on miniImagenet dataset #11

9 https://github.com/WenbinLee/DN4/issues/9 in this comment you