amiratag / ACE

Towards Automatic Concept-based Explanations
MIT License
158 stars 41 forks source link

SOURCE_DIR clarification #6

Closed justcho5 closed 5 years ago

justcho5 commented 5 years ago

SOURCE_DIR: Directory where the discovery images (refer to the paper) are saved. It should contain (at least) num_random_exp + 2 folders: 1-"target_class" which contains images of the class to be explained. 2-"random_discovery" which contains randomly selected images of the same dataset (at lease $max_imgs number of images). 3-"random500_0, ..., random500${num_random_exp} where each one contains 500 randomly selected images from the data set"

So I have a dataset with images belonging to either class A or B. I want to explain class A. The target_class directory should contain class A images. random_discovery should contain random images from the dataset which can be either class A or B. and random500_x directories should contains images from the dataset, which can be either class A or B. All the images for each of these folders come from the same dataset. Is that correct?

tabularML commented 5 years ago

That is correct. The only thing is that the name of the folder "target_class" should be the same as the name of the class provided for --target_class argument. (random500_x folders and random_discovery folders should contain images randomly sampled from the dataset independent of the image labels)

justcho5 commented 5 years ago

If the random folders contain images that are randomly sampled from the dataset, would unbalanced classes create any problems? My model learns two classes and the training data for one has significantly larger than the other. Should I randomly sample from a subset of the dataset, such that the two classes are equal?