Open mshearer0 opened 4 years ago
I got the same thing happened. Not sure why. Hopefully I can find out the reason behind this
I came to Github to find answer to related problem: why are these digits representative, using the code: What does
representative_digit_idx = np.argmin(X_digits_dist, axis=0)
argmin do here, to make these ones representative?
I came to Github to find answer to related problem: why are these digits representative, using the code: What does
representative_digit_idx = np.argmin(X_digits_dist, axis=0)
argmin do here, to make these ones representative?
argmin
is just to find the index where the value is the minimum in the axis=0
in this case, this minimum value is equivalent to the minimum distance between the cluster's centroid and the instance, thus making it representative.
I get a different set of representative digits from those in the notebook. Labelling with
y_representative_digits = np.array([ 0, 1, 3, 2, 7, 6, 4, 6, 9, 5, 1, 2, 9, 5, 2, 7, 8, 1, 8, 6, 3, 1, 5, 4, 5, 4, 0, 3, 2, 6, 1, 7, 7, 9, 1, 8, 6, 5, 4, 8, 5, 3, 3, 6, 7, 9, 7, 8, 4, 9])
produces a log_reg score of 92.4%. Alternatively using:
y_representative_digits = y_train[representative_digit_idx]
Hi,
Did you set the random_state=42 while training and splitting the dataset? Hopefully, that should solve your problem.
Thanks
Ash
Thanks Ash.
Yes, random_state = 42 is set in both the split and the kmeans definition
I get a different set of representative digits from those in the notebook. Labelling with
y_representative_digits = np.array([ 0, 1, 3, 2, 7, 6, 4, 6, 9, 5, 1, 2, 9, 5, 2, 7, 8, 1, 8, 6, 3, 1, 5, 4, 5, 4, 0, 3, 2, 6, 1, 7, 7, 9, 1, 8, 6, 5, 4, 8, 5, 3, 3, 6, 7, 9, 7, 8, 4, 9])
produces a log_reg score of 92.4%. Alternatively using:
y_representative_digits = y_train[representative_digit_idx]