chartbeat-labs / textacy

NLP, before and after spaCy
https://textacy.readthedocs.io
Other
2.21k stars 249 forks source link

Spherical k-means for sparse vector clustering #303

Closed roundsmile closed 4 years ago

roundsmile commented 4 years ago

-- coding:utf-8 --

from numpy import array from scipy.sparse import coo_matrix from soyclustering import SphericalKMeans

filename = "C:/Users/dream/Desktop/3번/final/3번/group1_matrix.txt"

with open(filename, 'r') as f: data = f.read() lines = data.splitlines()

print(lines)

print(lines)

numbers = [] for line in lines: numbers.append(line.split())

print(numbers)

row_list =[] col_list = [] data_list = []

for i in range(0, len(numbers[0])): for j in range(1, len(numbers)):

    if int(numbers[j][i])>0:

        data_list.append(int(numbers[j][i]))
        col_list.append(int(numbers[0][i]))
        row_list.append(j)

print(col_list)

print(row_list)

print(data_list)

print(len(col_list))

col_array = array(col_list) row_array = array(row_list) data_array = array(data_list) A = coo_matrix((data_array, (row_array, col_array))) print(A)

from soyclustering import SphericalKMeans

spherical_kmeans = SphericalKMeans( n_clusters=1000, max_iter=10, verbose=1, init='similar_cut', sparsity='minimum_df', minimum_df_factor=0.05 )

labels = spherical_kmeans.fit_predict(A) print(labels)

from soyclustering import proportion_keywords

centers = spherical_kmeans.clustercenters idx2vocab = ['list', 'of', 'str', 'vocab'] keywords = proportion_keywords(centers, labels, index2word=idx2vocab)

from soyclustering import visualize_pairwise_distance

visualize pairwise distance matrix

fig = visualize_pairwise_distance(centers, max_dist=.7, sort=True)

from soyclustering import merge_close_clusters

group_centers, groups = merge_close_clusters(centers, labels, max_dist=.5) fig = visualize_pairwise_distance(group_centers, max_dist=.7, sort=True)

for group in groups: print(group)

question

I don't know what I have to input here idx2vocab = ['list', 'of', 'str', 'vocab'] what it means? Please help me

bdewilde commented 4 years ago

Hi @roundsmile , I don't see any textacy code in there, and I don't know anything about soyclustering. I recommend you post an issue on their repo: https://github.com/lovit/clustering4docs/issues