Open supreetkt opened 5 years ago
Yes, there could be some problems. As I mentioned in the issue I have created, this implementation of k-modes is incorrect, which leads to empty clusters. If you are interested, you can check my refactored version of k-modes.
I'm receiving index error on the line #317: random_element = random.choice(clusters[biggest_cluster].members) I have a large dataframe (10000+ rows and 15+ columns). I tried this first with k=2. I debugged the program and it is because cluster_sizes gets 0 as value in two of its elements, but I'm not able to understand why.
If I limit my dataframe by say, a 100 rows, this error goes away, but then I get another error after 3 iterations of the algorithm: 'More clusters than data points?'
Any ideas on how to solve this?