facebookresearch / pysparnn

Approximate Nearest Neighbor Search for Sparse Data in Python!
Other
918 stars 145 forks source link

ValueError: setting an array element with a sequence.Using pysparnn.matrix_distance.SlowEuclideanDistance #27

Open liuchenbaidu opened 5 years ago

liuchenbaidu commented 5 years ago

import pysparnn.cluster_index as ci

from sklearn.feature_extraction.text import TfidfVectorizer import pysparnn data = [ 'hello world', 'oh hello there', 'Play it', 'Play it again Sam', ] data=['你在干什么', '你在干啥子', '你在做什么', '你好啊', '我喜欢吃香蕉']

tv = TfidfVectorizer() tv.fit(data)

features_vec = tv.transform(data) print(type(features_vec),features_vec.shape)

build the search index!

cp = ci.MultiClusterIndex(features_vec, data,pysparnn.matrix_distance.SlowEuclideanDistance)

search the index with a sparse matrix

search_data = [ 'oh there', 'Play it again Frank' ]

search_data = [ '你在干啥','我喜欢吃香蕉' ] search_features_vec = tv.transform(search_data)

res=cp.search(search_features_vec, k=3, k_clusters=3, return_distance=False)

print(res)

kchaliki commented 4 years ago

@liuchenbaidu indeed that code doesn't work with sparse matrices, the test actually uses dense which is why this went unnoticed. I did implement this separately somewhere using scikit's euclidean distance but it is so much slower than cosine that it begs the question whether you need it.