dnbaker / dashing

Fast and accurate genomic distances using HyperLogLog
GNU General Public License v3.0
160 stars 11 forks source link

Nearest Neighbor Support #39

Closed dnbaker closed 4 years ago

dnbaker commented 4 years ago

This pull request provides Nearest Neighbor support.

For similarity measures, such as Jaccard, histogram similarity, and containment, it returns the k greatest similarities and associated indexes.

For dissimilarity measures, such as distances, it returns the k lowest similarities with associated indexes.

It can be activated with the --nearest-neighbor [number] flag, and is available for both all pairs calculations (default/-F) and the query/reference interface (-Q/-F). Output format is either tabular (default), or, if -b/--binary is activated, a raw binary dump of the distances and indices.

Partially addressed https://github.com/dnbaker/dashing/issues/33 (nearest neighbors), but not thresholded output.