facebookresearch / pysparnn

Approximate Nearest Neighbor Search for Sparse Data in Python!
Other
916 stars 145 forks source link

Duplicate Elements >> matrix_size cause infinite loop #5

Closed spencebeecher closed 7 years ago

spencebeecher commented 8 years ago

When you are dealing with a large number of duplicate items the recursive nature of the alg goes into an infinite loop. Instead of getting the duplicates distributed evenly at a level they are all allocated to the first item in the matrix.

The current fix is not very efficient:

Line ~96 in nearest_search within matrix_distance.py

if scores.sum() < 0.0001 and len(scores) > 0:
    # they are all practically the same
    # we have to do this to prevent infinite recursion
    # TODO: would love an alternative solution
    arg_index = np.random.choice(len(scores), k, replace=False)
else:
    arg_index = np.argsort(scores)[:k]