LINs-lab / ttab

[ICML23] On Pitfalls of Test-Time Adaptation
https://arxiv.org/abs/2306.03536
Apache License 2.0
102 stars 9 forks source link

The experimental results of note are inconsistent with the results in this paper #10

Closed AAAI-2025 closed 11 months ago

AAAI-2025 commented 1 year ago

Dear authors, I have requests on several things. We run the code in the following environment: "--model_adaptation_method"——— "note" "--model_selection_method"——— "last_iterate" "--model_selection_method"——— "cifar10" "--model_name"——— "resnet26" "--episodic"——— "false" "--data_names"——— ("cifar10_c_deterministic-snow-5;" "cifar10_c_deterministic-brightness-5;" "cifar10_c_deterministic-fog-5;" "cifar10_c_deterministic-frost-5;" "cifar10_c_deterministic-contrast-5;" "cifar10_c_deterministic-motion_blur-5;" "cifar10_c_deterministic-glass_blur-5;" "cifar10_c_deterministic-zoom_blur-5;" "cifar10_c_deterministic-gaussian_noise-5;" "cifar10_c_deterministic-shot_noise-5;" "cifar10_c_deterministic-jpeg_compression-5;" "cifar10_c_deterministic-impulse_noise-5;" "cifar10_c_deterministic-pixelate-5;" "cifar10_c_deterministic-elastic_transform-5;" "cifar10_c_deterministic-defocus_blur-5",) "--batch_size"—— 100 "--lr"—— 1e-4 "--n_train_steps"——— 1 "--inter_domain"———“HomogeneousNoMixture” The error rate of note is 41%, which is quite different from the result in table 2 "NOTE-online"(24.0 ± 0.1) in this paper. Is this issue caused by the difference between our experimental environment and the setting of the original paper?Or there are other reasons?

MarcellusZhao commented 1 year ago

Hello, Thanks for your interest in reproducing our work. To answer your questions:

Is this issue caused by the difference between our experimental environment

I don't think the small difference in python environment will lead to such large performance gap.

the setting of the original paper?

Yes. In order to compare different state-of-the-art TTA methods as fair as possible, for online results we benchmark in the paper, we run with different combinations of lr and n_train_steps, and pick the best one regarding the metrics we used out of all results. Please see more details in scripts like this one we upload in exps folder.

Or there are other reasons?

Yes, please carefully set up the value of hyper-parameters like batch_size and check with scripts we provided in exps folder.

Hope it helps!

AAAI-2025 commented 1 year ago

Hello, thanks for your timely reply. I am going to try your suggestions recently. I would tell you soon if it works. Thanks a million.

MarcellusZhao commented 1 year ago

Hi, @jiangqinting. Do you have any further feedback on this problem? If you need more help, feel free to contact us.