IndexError in pyspark_kmodes

I'm receiving index error on the line #317: random_element = random.choice(clusters[biggest_cluster].members) I have a large dataframe (10000+ rows and 15+ columns). I tried this first with k=2. I debugged the program and it is because cluster_sizes gets 0 as value in two of its elements, but I'm not able to understand why.

If I limit my dataframe by say, a 100 rows, this error goes away, but then I get another error after 3 iterations of the algorithm: 'More clusters than data points?'

Any ideas on how to solve this?

ThinkBigAnalytics / pyspark-distributed-kmodes

IndexError in pyspark_kmodes #6