Spark library for generalized K-Means clustering. Supports general Bregman divergences. Suitable for clustering probabilistic data, time series data, high dimensional data, and very large data.
I don't get the rationale behind Weighted Vector, As far as I got, WeightedVector applies the same weight to each Vector element. For example, if I have
val v = Vectors.dense(1,0.5,3)
val wv = WeightedVector(v,0.5)
wv will be treated as Vector.dense(0.5,0.25,1.5) in terms of clustering, right?
Now, let's say I'm extracting 2 features from data, one feature it's represented by one vector element and the other one is represented by 20 vector elements. Now I want that, for what concerns clustering, both the features have the same weight, so I should weight the first element as 1 and the other 20 as 1/20, right?
I expected this kind of functionality from weighted vector, I don't see the point of WeightedVectors as they are now, but probably is because of my lack of experience about clustering and data mining in general.
I don't get the rationale behind Weighted Vector, As far as I got, WeightedVector applies the same weight to each Vector element. For example, if I have
wv
will be treated asVector.dense(0.5,0.25,1.5)
in terms of clustering, right? Now, let's say I'm extracting 2 features from data, one feature it's represented by one vector element and the other one is represented by 20 vector elements. Now I want that, for what concerns clustering, both the features have the same weight, so I should weight the first element as 1 and the other 20 as 1/20, right? I expected this kind of functionality from weighted vector, I don't see the point of WeightedVectors as they are now, but probably is because of my lack of experience about clustering and data mining in general.