JuliaStats / Clustering.jl

A Julia package for data clustering
Other
353 stars 117 forks source link

sparse matrices fail #17

Open swadey opened 10 years ago

swadey commented 10 years ago

I get this error when calling kmeans on a sparse matrix:

julia> kmeans(x', 50)                                                                                                                                                         
ERROR: no method kmeans(SparseMatrixCSC{Float32,Int32}, Int64) 

Could this be due to the StoredArray change in julia?

swadey commented 10 years ago

BTW, I'm on julia HEAD: JuliaLang/julia@244cffc7d99b74fa2b7aab9efed812aeba7e4b38

lindahua commented 10 years ago

The algorithm itself is only for dense matrices.

We may add a k-means algorithms for sparse matrices something in future. However, this is not very high in our priority list. A pull request may make this happen faster.

swadey commented 10 years ago

@lindahua is there an actual dependency on dense vectors or just that it produces dense centroids? I don't know what the implementation is doing, but if it's doing some kind of kd-tree/ball-tree for a nearest neighbor approximation, that would make sense.

lindahua commented 10 years ago

The algorithm scans each element in a dense pattern when computing the mean & computing distances. The pairwise distance computing function only accepts dense matrices, as it relies on BLAS's gemm to compute distances in a very fast way.

lindahua commented 10 years ago

It does not use kd-tree in any way, it just relies on BLAS to compute pairwise Euclidean distances.