JuliaStats / Clustering.jl

A Julia package for data clustering
Other
354 stars 117 forks source link

affinity propagation result is not consistent with sklearn in python #131

Open xiuliren opened 5 years ago

xiuliren commented 5 years ago

the two clustering results are different. Julia version did not do any clustering since the assignment is just the index of each object! My similarity matrix is too large to show here.

using Clustering

@time affinityPropResult = Clustering.affinityprop(similarityMatrix)

affinityPropResult.assignments
using PyCall

@pyimport sklearn.cluster as cl
af = cl.AffinityPropagation(affinity="precomputed")[:fit](similarityMatrix)

labels = af[:labels_]

The travis test also did not verify the correctness of the result.

alyst commented 5 years ago

Thanks for the report! You are most welcome to submit a fix. Otherwise, if you have a small (<100 entities, the smaller the better) reproducible example, I can look into that.

xiuliren commented 5 years ago

I double checked the results, they are same with some random tests. I did not add 1 for python result since python start from 0.

labels = af[:labels_] .+ 1

xiuliren commented 5 years ago

this issue still exist. The tests shows that it requires that the diagnal value should be the median value of the similarity matrix, otherwise the result is not consistent with python!

This is my test code:

using Distances
using Clustering
using LinearAlgebra
using Random
using Statistics 
Random.seed!(123)

    d = 10
    n = 44
    x = rand(d, n)
    S = -pairwise(Euclidean(), x, x)

    # set diagonal value to median value
#     S = S - diagm(0 => diag(S)) + median(S)*I

    R = affinityprop(S)

    k = length(R.exemplars)

using PyCall

@pyimport sklearn.cluster as cl
af = cl.AffinityPropagation(affinity="precomputed")[:fit]( S )
ref_assignments = af[:labels_] .+ 1

@assert randindex(R.assignments, ref_assignments)[2]==1.0