carsonfarmer / fastpair

FastPair: Data-structure for the dynamic closest-pair problem.
MIT License
12 stars 4 forks source link

info & background on `merge_closest()` method #34

Open jGaboardi opened 1 month ago

jGaboardi commented 1 month ago

I am curious about the history & commenting out of the merge_closest() method. A quick search didn't lead to any previous issues or PRs (though I likely overlooked something). Is there anything you can point me to for that, @carsonfarmer?

carsonfarmer commented 1 month ago

Yeh, to be honest, I'm not totally sure? I suspect it was commented out, because I didn't have a very robust test for it. I recall essentially wanting to create random clusters, and then testing that they were recoverable via the merge_closest() flow. But I didn't end up implementing that.

jGaboardi commented 1 month ago

So do you think we want to:

cc @gegen07

carsonfarmer commented 1 month ago

Well, obviously 3 is the ideal outcome. But given I no longer have a core use case for this, I would likely find it difficult to devote the time. If @gegen07 has a use case/data set/example, I'm sure we could use that!

Alternatively, we can just enable it, do the minimal test, and see if folks find it useful? For instance, the merge_closest method is really just a wrapper around the example/demo, which does indeed work.

jGaboardi commented 1 month ago

I found the commit where merge_closest() and interact() were removed -- https://github.com/carsonfarmer/fastpair/commit/92364962f6b695661f35a117bf11f96584128a8d

carsonfarmer commented 1 month ago

Ah yes, interesting. So based on the comment, it looks like there was some weirdness with merging and not getting consistent results in tests. Having said that, it looks like there's no good reason to not support this, and just allow someone to supply their own merge function as input to merge_closest().

jGaboardi commented 1 month ago

Putting this chunk from the README in here to keep around for a notebook for when these become part of fastpair again -- xref #45 .

To illustrate the mergeing methods, here is a simple example of hierarchical clustering, treating points as the 'centroids' of various clusters:

for i in range(len(fp)-1):
   # First method... do it manually:
   dist, (a, b) = fp.closest_pair()
   c = interact(a, b)  # Compute mean centroid
   fp -= b
   fp -= a
   fp += c
   # Alternatively... do it all in one step:
   # fp.merge_closest()
len(fp)  # 1
jGaboardi commented 3 weeks ago

I'm just now realizing that interact() is actually defined within test_fastpair.py and then used for a merge_closest() test there, as well. Not sure how I missed this before.

jGaboardi commented 3 weeks ago

@carsonfarmer Should we go ahead and get those moved back into base.py and flesh out some tests for them?