deeplearning-wisc / hypo

13 stars 0 forks source link

Question about OOD ACC #1

Open Esther-PAN opened 4 months ago

Esther-PAN commented 4 months ago

Hello, this work has provided me with a lot of inspiration! But I have some uncertainties regarding the evaluation criteria for out-of-distribution generalization. Based on my understanding of out-of-distribution detection, when assessing OOD detection performance, OOD samples are typically from an unknown distribution, and the label categories differ from the in-distribution (ID). However, in the context of out-of-distribution generalization, when using CIFAR-10 as the ID training set and CIFAR-10-C as the OOD test set, does the reported OOD ACC imply that the model can accurately classify CIFAR-10-C ? In other words, does the OOD test set consist of images that belong to the same categories as the ID but from different domains (styles), and is the detector still expected to accurately classify them?

alvinmingwisc commented 4 months ago

Yes, your understanding is correct. OOD detection and OOD generalization are two distinct tasks. In the context of OOD generalization, the label set remains the same as the in-distribution (ID) training set. Here, "OOD" typically refers to shifts in the distribution that might arise from changes in domain, style, or other factors, rather than changes in the label categories. The model is expected to accurately classify inputs from such distributional shifts.

Esther-PAN commented 4 months ago

Thank you for your prompt response! I am still a bit confused about the difference between OODG and DG(Domain Generalization). Do the results shown in the article on other benchmarks such as PACS follow the DG setup for training and testing?

alvinmingwisc commented 4 months ago

Yes, for domain generalization benchmarks, we follow the setup of the seminal work: https://github.com/facebookresearch/DomainBed.

The term "out-of-distribution generalization" has a broader scope than (multi-source) domain generalization. In the DG setting, one typically has a finite set of training domains (e.g., in PACS, we have 4 distinct domains)

Esther-PAN commented 4 months ago

Thank you for your explanation. So, if I understand correctly, the comparison of out-of-distribution generalization ability (ood-acc) is primarily observed in the CIFAR-10/CIFAR-10-C experiment. However, when comparing with other popular benchmarks like PACS and other domain generalization methods, the displayed results are based on in-distribution accuracy (ID-acc). Is my understanding correct?

alvinmingwisc commented 3 months ago

When comparing with other popular benchmarks like PACS and other domain generalization methods, the displayed results are also OOD accuracy (the accuracy on the OOD domain). We used the common evaluation protocol in the domain generalization community (e.g., https://github.com/facebookresearch/DomainBed)