facebookresearch / DomainBed

DomainBed is a suite to test domain generalization algorithms
MIT License
1.42k stars 298 forks source link

sweep accuracy far lower than Full results for commit 7df6f06 #37

Closed YiDongOuYang closed 3 years ago

YiDongOuYang commented 3 years ago

I follow the same command line mentioned in README.md and random choose an algorithm to get some results. However, I found my results are more then 20% below what had reported. Could you please help me to figure out the problems?

I have already recheck my PACS dataset, which is the same as "https://drive.google.com/uc?id=0B6x7gtvErXgfbF9CSk53UkRxVzg" in download.py. All command line can be found below.

python -m domainbed.scripts.sweep launch\ --data_dir=/home/guoweiyu/yidong/data/PACS\ --output_dir=/home/guoweiyu/yidong/dg/sweep\ --command_launcher multi_gpu\ --algorithms DANN\ --datasets PACS\ --n_hparams 1\ --n_trials 1

Environment: Python: 3.7.3 PyTorch: 1.3.1 Torchvision: 0.4.2 CUDA: 10.1.243 CUDNN: 7603 NumPy: 1.16.2 PIL: 5.4.1 Args: algorithm: DANN checkpoint_freq: None data_dir: /home/guoweiyu/yidong/data/PACS/ dataset: PACS holdout_fraction: 0.2 hparams: None hparams_seed: 0 output_dir: train_output save_model_every_checkpoint: False seed: 0 skip_model_save: False steps: None test_envs: [0] trial_seed: 0 HParams: batch_size: 32 beta1: 0.5 class_balanced: False d_steps_per_g_step: 1 data_augmentation: True grad_penalty: 0.0 lambda: 1.0 lr: 5e-05 lr_d: 5e-05 lr_g: 5e-05 mlp_depth: 3 mlp_dropout: 0.0 mlp_width: 256 resnet18: False resnet_dropout: 0.0 weight_decay: 0.0 weight_decay_d: 0.0 weight_decay_g: 0.0

(base) guoweiyu@bj08:~/yidong/dg/DomainBed-master$ python -m domainbed.scripts.collect_results> --input_dir=/home/guoweiyu/yidong/dg/sweep Total records: 170

-------- Dataset: PACS, model selection method: training-domain validation set Algorithm A C P S Avg
DANN 69.7 +/- 0.0 68.1 +/- 0.0 96.9 +/- 0.0 64.8 +/- 0.0 74.9

-------- Averages, model selection method: training-domain validation set Algorithm PACS Avg
DANN 74.9 +/- 0.0 74.9

-------- Dataset: PACS, model selection method: leave-one-domain-out cross-validation Algorithm A C P S Avg
DANN 40.3 +/- 0.0 68.1 +/- 0.0 94.1 +/- 0.0 64.8 +/- 0.0 66.8

-------- Averages, model selection method: leave-one-domain-out cross-validation Algorithm PACS Avg
DANN 66.8 +/- 0.0 66.8

-------- Dataset: PACS, model selection method: test-domain validation set (oracle) Algorithm A C P S Avg
DANN 21.1 +/- 0.0 18.1 +/- 0.0 29.0 +/- 0.0 18.6 +/- 0.0 21.7

-------- Averages, model selection method: test-domain validation set (oracle) Algorithm PACS Avg
DANN 21.7 +/- 0.0 21.7
(base) guoweiyu@bj08:~/yidong/dg/DomainBed-master$ Welcome to Ubuntu 14.04.4 LTS (GNU/Linux 4.2.0-27-generic x86_64)

lopezpaz commented 3 years ago

Can you share what commit of the repo are you using?

lopezpaz commented 3 years ago

Also, you should let --n_hparams and --n_trials take their default values to replicate our results. Please re-open if when doing so you get different numbers.

YiDongOuYang commented 3 years ago

Thanks for your quick reply! I use the newest version, b953488d4dcfcc76427f07958b133b87d24a48e5 and I will take the default values of --n_hparams and --n_trials.

Another question is why we should run [0,1],[0,2],[0,3],[1,2],[1,3],[2,3]. If I understand right, it means the target domain is the mixture of domain 0 and domain 1.

0
lopezpaz commented 3 years ago

Having two domains for test is necessary for the leave-one-domain-out model selection criterion.

YiDongOuYang commented 3 years ago

Thank you! Is there any way to reduce the number of experiments when I want to get the accuracy under training-domain validation selection criterion? Since launching sweep.py automatically gets the results of all three criteria.

lopezpaz commented 3 years ago

You could replace these lines by

all_test_envs = [[d] for d in range(datasets.num_environments(dataset))]

to avoid launching jobs with two test environments, only necessary for leave-one-domain-out validation.

lopezpaz commented 3 years ago

Launching sweeps and selecting models are two completely separate processes. Selecting models is done based on a finished sweep. So first you need to decide what your sweep will contain (algorithms, datasets, environments...). Once you launch and finish your sweep, you will be able to select models from that finished sweep according to different strategies (in-domain, leave-one-out, oracle).

lopezpaz commented 3 years ago

Again, sweeps do not have anything to do with model selection strategies. They just run random combinations of hyper-parameters and log all the results to files. Then you can run different model selection strategies on those files.

If you want to run only some subset of jobs (one particular test env), you will have to hack sweep.py

DevinCheung commented 2 years ago

@YiDongOuYang Hi, have you found out the reason for lower sweep accuracy? I encountered the same problem. I used default "n_hparams" and "n_trials".