Stop the clustering algorithm early

Caster commented 3 years ago

From #1, @stardisblue commented:

btw, i'm wondering about two things :

how would you stop the clustering at a certain step (for example, i don't need to calculate until full convergence but just until 1 equals 10base units).

I have trouble extracting the result value of the clustering ; in accordance to step1. Since it gives me the ultimate cluster.

Caster commented 3 years ago

Thank you for your interest! Out of sheer curiosity, can I know what you are working on/using this code for?

Regarding your questions.

In algorithm.clustering.QuadTreeClusterer at line 157, you'll find the main loop that controls the clustering. In the getNextEvent(MultiQueue, GlobalState) function you'll find that the first line checks for the number of glyphs still remaining in the clustering. You can stop early by modifiying that code. For your stopping condition: do you mean when any glyph has grown by a factor of 10?
You might want to use the datastructure.HierarchicalClustering.View subclass, It can be used to walk through the merges in order and extract the clustering at any point in time. Either use the getGlyphs(GrowFunction) method or look at the curr private property of the class (through a getter then, probably).

Does that help?

stardisblue commented 3 years ago

Thank you for opening a new issue, I was going offpoint 👍.

I want to recalculate the layout depending on filtered data, contrary to Glottovis/glammap which does not update clustering on filter but represent filtered data as a grayed/transparent.
Recalculating the clustering is expensive I want it to stop at current zoom level, my priority being the representation of the filtered information and not to calculate clustering at all possible zoom levels.
Since I stopped a certain zoom level, do I need to walk through since I could get the actual state of the clustering ? But I fail to see how to do it effectively. Maybe I still need to walk through HierarchicalClustering...

ps: I'm implementing this software in TypeScript for it to be used in leaflet. This leads to some surprises : the order of CsvIO data influences the statistics (like number of merges) because it uses a HashSet to merge LatLngs...

More generally HashSet, PriorityQueue behaviours on ordering (when the value data is the same in case of PriorityQueue) can influence the outcome statistic when using quad on glottovis dataset.

I wish to release it as a repository once I'm happy with the result (with full credits :)). But it looks a little bit too archaic for the moment 😅.

stardisblue commented 3 years ago

From what i see in the testings, when i reach at equals 1, the direct projection of the data (1 to 1 visualization) does not have overlap. So in theory I just need to stop the clusterisation when at > 1

Caster commented 3 years ago

Hi @stardisblue, thank you for your response and sorry for not replying sooner. What you are doing sounds very interesting, I'm looking forward to seeing it once you are ready to share.

I understand. The last version of GlamMap actually recalculates the clustering on filter (try selecting a range in the timeline at the bottom). Still, I understand why you'd want to be able to stop the clustering algorithm early. I'll elaborate below.
If you stop early, you'll have to iterate over the glyphs in the GlobalState#map, filter those that are still alive and take the mapped HierarchicalClustering objects to get the complete clustering. Alternatively you could explicitly keep track of the set of alive glyphs during clustering, of course. The reason this is not necessary currently is that we only need to look at the last merge to find the root glyph (and corresponding HierarchicalClustering object).

Also tying back to your second comment; the clustering algorithm works with time (at), while your map has a zoom level. Normally you would define a relation between them (1:1, 1:2^n, etc.), but that is completely up to you. Essentially, whatever at you choose, the glyphs will not overlap at that zoom level as long as you draw them at the size they have then. Obviously choosing a very small at on a very low zoom level will make the result look a bit odd (many small glyphs while zoomed out quite far), but that is your choice. I hope that helps?

Caster commented 3 years ago

Also out of curiosity, since you write

Recalculating the clustering is expensive [...]

That's certainly true and the reason we invested so much time in optimizing the clustering. The current code base (QUAD+BIG in the paper) handles tens of thousands of glyphs within seconds. In my experience, trying to stop the clustering early and handling the glyphs separately afterwards doesn't lead to big time improvements. It actually takes relatively much time to filter data and feed it to the clustering algorithm, and to postprocess it and show it on the map. But that's just my two cents and something to think about, maybe it doesn't apply to your use case.

stardisblue commented 3 years ago

Sorry for not keeping you up to date, I released a typescript version of this tool at https://github.com/stardisblue/ts-growing-glyphs. If you're interested feel free to propose changes to the library :).

Caster commented 3 years ago

Thanks for the update, I'm planning to take a look soon! If I have any feedback or changes, Ill open an issue or PR.

Caster / growing-glyphs

Stop the clustering algorithm early #2