Closed Jwicaksana closed 5 years ago
Hi @Jwicaksana
Yes, I added the IMAGES_PER_LABEL variable in line 24 to limit the number of images, for two reasons:
I wanted to limit the number of samples during the bugfixes, because I couldn't retrain each time the network on the whole dataset, especially that I used the CPU only.
Since the Fer2013 dataset is unbalanced (there's more samples on some expressions), I wanted to check the bias of the classifier. The experiment consisted to train the classifier after making all the expressions having the same number of samples, then compare it with the biased classifier, which uses all the data.
I think I just pushed the code then and forget to reset this variable. So please just change the line 24
and set the limit to a very big number to skip the condition, something like:
IMAGES_PER_LABEL = 10000000
Got it thanks!
First of all, I verified that the fer2013.csv contains 28709 training data, 3436 publictest, and 3589 privatetest.
However, after running the convert_fer2013_to_images_and_landmarks, I only get 3436 training data, 56 publictest data, and 8 private test data, which I think happened because of this if labels[i] in SELECTED_LABELS and nb_images_per_label[get_new_label(labels[i])] < IMAGES_PER_LABEL:
I guessed that you want to clip the expression so the data will be balance but because you mix the limit for all training , public, and private data, it resulted in imbalanced data distribution.
Was that intentional? Since I think the training becomes really weird with only 3k data and 56 val and 8 test data.
Thanks