facebookresearch / pysparnn

Approximate Nearest Neighbor Search for Sparse Data in Python!
Other
916 stars 145 forks source link

Large data sets cannot be used. #7

Closed younghj closed 8 years ago

younghj commented 8 years ago

When I run the script with facebookresearch's fasttext, it cannot acommodate it. But the sample script seems to work well. On this line of the example script: cp = snn.ClusterIndex(feat, data_to_return)

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-14-61d2fb78a7b1> in <module>()
----> 1 cp = snn.ClusterIndex(feat, data_to_return)

/home/jung4351/pysparnn/pysparnn/cluster_pruning.py in __init__(self, sparse_features, records_data, distance_type, matrix_size, parent)
    144             records_index = np.arange(sparse_features.shape[0])
    145             clusters_size = min(self.matrix_size, num_records)
--> 146             clusters_selection = random.sample(records_index, clusters_size)
    147             clusters_selection = sparse_features[clusters_selection]
    148

/usr/lib/python3.4/random.py in sample(self, population, k)
    309             population = tuple(population)
    310         if not isinstance(population, _Sequence):
--> 311             raise TypeError("Population must be a sequence or set.  For dicts, use list(d).")
    312         randbelow = self._randbelow
    313         n = len(population)

TypeError: Population must be a sequence or set.  For dicts, use list(d).
spencebeecher commented 8 years ago

Hi @younghj ! Could you please povide me a little more information? What data file are you referencing and how are your constructing the 'feat' variable?

Feel free to attach a notebook file!

spencebeecher commented 8 years ago

I am marking this as invalid until you supply more info. Thanks!

From a closer inspection it appears you might be passing a dictionary and not a list as the data_to_return variable.