Peratham / semanticvectors

Automatically exported from code.google.com/p/semanticvectors
Other
0 stars 0 forks source link

Empty Cluster Problem #70

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. How to solve empty cluster problem?
2.
3.

What is the expected output? What do you see instead?
I expect 3 clusters using the following command:-
java pitt.search.semanticvectors.ClusterVectorStore -numclusters 3 
docvectors.bin >clusters.txt
But in clusters.txt I am getting only 1 cluster:-
John    1.0

What version of the product are you using? On what operating system?
I am using "semanticvectors-4.0 "  on ubuntu 12.04

Please provide any additional information below.
Although I have 21 documents. I don't want any cluster to be empty. What should 
I do?

Original issue reported on code.google.com by rohitdee...@gmail.com on 3 Oct 2013 at 3:39

GoogleCodeExporter commented 9 years ago
Sometimes this happens due the random initialization used in the k-means 
algorithm.

Things you can try include:
- Rerunning a few times to see if the random initialization changes.
- Try a different number of dimensions.
- Try a different algorithm, e.g., use BuildPositionalIndex or LSA instead of 
BuildIndex.
- Change the cluster initialization around line 95 of ClusterResults.java to be 
round-robin instead of random. (I'd try this myself right now but I'm about to 
get on a plane!)

Original comment by dwidd...@gmail.com on 3 Oct 2013 at 8:12