Closed zilch42 closed 2 months ago
I guess one further comment to this, it would be ideal to have the visual placement of the points to align as close as possible to how the points group together within the clusters. So if the following...
UMAP(
n_neighbors=15,
n_components=2,
min_dist=0.0,
metric='cosine'
)
...is going to be as close as possible visually to whatever is happening inside this...
EVoC(
n_neighbors = 15,
)
... then great. But if EVoC
is grouping things different to how UMAP
would, then it might be useful to also have coordinates coming out of EVoC
.
In practice you will want a separate UMAP run for feeding into DataMapPlot unfortunately. EVoC has a very custom approach to the effective dimension reduction step, and the results will be quite bad for visualization purposes (but work very well for clustering purposes). In general the clustering should align very well with UMAP results (with occasional stray points here and there), especially with the parameter choices you have above (although you can likely vary min_dist
to something larger without much loss).
If you really need very good alignment with the UMAP result is is likely best to actually do your clustering on the UMAP output (and not with EVoC, which will do very weird things to that).
No worries, thanks!
Great work with this package, I'm just starting to experiment with it. Very Exciting!
Just wondering about plugging the clustered data into
DataMapPlot
. Will UMAP (or other) still be required to reduce higher dim vectors down to 2D to supply todata_map_coords
separately? Or canevoc
supply that too? Just thinking ifevoc
is doing some of what UMAP does anyway, is there some efficiency by not recalculating the dimension reduction separately? Or is it better for the user to have discrete control over the coordinates for the visualization?Thanks