Closed veroandreo closed 2 years ago
The pick function returned by cvi_evaluators
does majority voting by default, and I don't know if it's possible to generalize that to more than 1 result. You could try some heuristics perhaps, for example a result I just got with the demo data looks like this:
> apply(comparison_part$results$partitional[17:23], 2, which.max)
Sil D COP DB DBstar CH SF
59 373 98 98 91 644 236
So, in my case, COP and DB agreed. I could then figure out how many configurations appear in the top X of both, say with X=20:
intersect(
sort(comparison_part$results$partitional$COP, decreasing = TRUE, index.return = TRUE)$ix[1:20],
sort(comparison_part$results$partitional$DB, decreasing = TRUE, index.return = TRUE)$ix[1:20]
)
[1] 98 717 236 720 59 40 39
That's just the first thing that came to mind, I can't really say if it's a particularly good idea :stuck_out_tongue:
Thanks for your answer @asardaes !
I thought maybe the score table within the compare_clustering()
result could hold the votes, hence it would be easy to pick the 10 most voted configs, for example. But the score table only contains the CVI values.
I'm thinking now I could also generate configs for different distances and then compare the best results from them. In that way I'd have at least the best clustering config per distance and I could do the voting among them or so. I will investigate further :-)
Hi,
Following the examples in the vignette and manual pages, I'm using
compare_clusterings_configs()
pluscompare_clusterings()
to obtain the best cluster configuration for my dataset.I wonder now if there's a way to get the 10 best configurations, for example, or the best for each distance that I evaluate, i.e., DTW, SBD, etc. How can I do the scoring myself and select the best configs from the huge table at
comparison_part$results$partitional
?Thank in advance for any hints!