Hoosier-Clusters / clusim

An extended package for clustering similarity
MIT License
63 stars 15 forks source link

how to construct dictionary from clusterings of disimilarity matrix #37

Closed pexlechris closed 4 years ago

pexlechris commented 4 years ago

Dear All Good morning,

I would like to inform you that we are two post-graduate students at the Computer Science department of the Aristotle University of Thessaloniki. Currently, we are working on a study to complete our thesis. Our research focuses on Card Sorting and its title is:

"Minimum number of participants for collection of reliable data from card sorting experiments"

The main purpose is to conduct a sufficient number of studies, with about 150 participants, on two travel and e-shop sites respectively, that we have chosen and after at least two type of comparisons, to specify the minimum number of participants that will give us reliable data. In our first analysis we conducted a Mantel test. For our second analysis, after a lot of research and tests, we concluded to “Clusim” package, in order to test “Element-Centric Similarity” between the results of the total participants (about 150) and the results of a part of these participants (1,3,5,7,10,15,20,30,50,100 for example, these amount of participants). We would like to ask you the following:

• If you believe that this is the right tool for testing the similarity of these two clustering • And if so, how can we export the results of the hierarchical clustering encoded as a linkage matrix in a dictionary format, in order to find the element score for each of the above cases • Could you please send us an example?

We are working on Python (Spyder). We use as inputs the dissimilarity matrices for a part of the participants (dis1) and for total participants (dis2). And we use the following code for making each hierarchical clustering:

def analysis_2(dis, title):

    global cards #count of cards
    mat = np.array(dis)
    #print(dis)
    #print(mat)
    dists = squareform(mat)
    #print(dists)
    linkage_matrix = linkage(dists, "average")
    #print(linkage_matrix)
    dendrogram(linkage_matrix, labels=range(1, cards+1))
    plt.title(title)
    plt.savefig(title+".png",dpi=300)
    plt.show()

Thanks in advance

Br

Zilidis Grigorios Pexlevanoudis Christos

ajgates42 commented 4 years ago

Hi Zilidis and Pexlevanoudis,

I think CluSim can help with your application. You can create a clustering directly from the linkage matrix: Clustering().from_scipy_linkage(linkage_matrix)

I've included a new example in the CluSim examples that compares two hierarchical clusterings. Application with Hierarchical Clustering

FYI, please direct future questions and inquires to my email: ajgates42@gmail.com, or on stackexchange / stack overflow.
GitHub Issues are for problems identified in the code.