Peratham / semanticvectors

Automatically exported from code.google.com/p/semanticvectors
Other
0 stars 0 forks source link

How to k-means clustering of documents? #77

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. I have 136 documents in docvectors.bin and I have run this command:-
java pitt.search.semanticvectors.ClusterResults -numclusters 4 docvectors.bin
2. Although the file cluster_centroids.bin is getting created. Its size is only 
1.6 kB
3.

What is the expected output? What do you see instead?
I expect the output to be 4 clusters vectors. But I get all clusters as empty 
like :-
Opening query vector store from file: termvectors
Setting dimension of target config to: 100
Searching term vectors, searchtype SUM
Didn't find vector for 'docvectors.bin'
No vector for 'docvectors.bin'
Mar 04, 2014 5:51:38 PM pitt.search.semanticvectors.ClusterResults kMeansCluster
INFO: Initializing clusters ...
Mar 04, 2014 5:51:38 PM pitt.search.semanticvectors.ClusterResults kMeansCluster
INFO: Iterating k-means assignment ...
Mar 04, 2014 5:51:38 PM pitt.search.semanticvectors.ClusterResults kMeansCluster
INFO: Got to stable clusters ...
Cluster 0

Cluster 1

Cluster 2

Cluster 3

About to write 4 vectors of dimension 100 to Lucene format file: 
cluster_centroids.bin ... finished writing vectors.

What version of the product are you using? On what operating system?
I am using "semanticvectors-4.0 "  on ubuntu 12.04

Please provide any additional information below.
I want to know how to get cluster vectors successfully?

Original issue reported on code.google.com by rohitdee...@gmail.com on 4 Mar 2014 at 12:22

GoogleCodeExporter commented 9 years ago
The problem I think is that the CLusterResults command you're using is asking 
to use the term "docvectors.bin" as a search query, not as an input file.

What you want instead is ClusterVectorStore - see 
http://semanticvectors.googlecode.com/svn/javadoc/latest-stable/pitt/search/sema
nticvectors/ClusterVectorStore.html#main(java.lang.String[]).

Please try this and write back saying whether it works or not, if you have any 
further trouble I'll try to help.

Original comment by dwidd...@gmail.com on 4 Mar 2014 at 5:20

GoogleCodeExporter commented 9 years ago
Sir,
    Actually it's still not working, on running using ClusterVectorStore, I
am getting output like this:-
INFO: Initializing clusters ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterResults
kMeansCluster
INFO: Iterating k-means assignment ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterResults
kMeansCluster
INFO: Got to stable clusters ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterVectorStore main
INFO: Clustering vectors ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterResults
kMeansCluster
INFO: Initializing clusters ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterResults
kMeansCluster
INFO: Iterating k-means assignment ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterResults
kMeansCluster
INFO: Got to stable clusters ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterVectorStore main
INFO: Clustering vectors ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterResults
kMeansCluster
INFO: Initializing clusters ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterResults
kMeansCluster
INFO: Iterating k-means assignment ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterResults
kMeansCluster
INFO: Got to stable clusters ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterVectorStore main
INFO: Clustering vectors ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterResults
kMeansCluster
INFO: Initializing clusters ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterResults
kMeansCluster
INFO: Iterating k-means assignment ...
Mar 05, 2014 6:10:33 PM pitt.search.semanticvectors.ClusterResults
kMeansCluster
INFO: Got to stable clusters ...
John    NaN

I think that it worked correctly, but I want to know the cluster vectors
for which I am not getting a binary file like "cluster_centroids.bin".
Can you please tell me how to get it??

Original comment by rohitdee...@gmail.com on 5 Mar 2014 at 12:39

GoogleCodeExporter commented 9 years ago
I don't know where this is going wrong right now - should be able to take a 
look within the next few days.

Can you check out the code and see where the NaN is occurring?

Which corpus are you using?

Original comment by dwidd...@gmail.com on 6 Mar 2014 at 5:27