facebookresearch / pysparnn

Approximate Nearest Neighbor Search for Sparse Data in Python!
Other
918 stars 145 forks source link

Near misses for min_distance thresholds #4

Closed spencebeecher closed 8 years ago

spencebeecher commented 8 years ago

From README:

Note on min_distance thresholds - Each document is assigned to the closest candidate cluster. When we set min_distance we will filter out clusters that dont meet that requirement without going into the individual clusters looking for matches. This means that we are likely to miss some good matches along the way since we wont investigate clusters that just miss the cutoff. A (planned) patch for this behavior would be to also search clusters that 'just' miss this cutoff.

spencebeecher commented 8 years ago

Removed min_distance in commit - https://github.com/facebookresearch/pysparnn/commit/a24cd4cc5fac3f5607c835c17e903d5783dd7860