Closed Weiqi97 closed 6 years ago
Is there an implementation issue that we need to change here or do we need to make clearer how the visualisations are implemented (either in the UI or in In the Margins)?
well, i don't think our methods are wrong; ... yes, but clarity is (really) needed
i was (also) surprised that k-means (on server, v3.1.1) performs PCA before both visualizations (PCA vs. Voronoi); my stats fail me here (as to why folks often use PCA prior to clustering)
@Weiqi97 noted that SciPy has a specific Voronoi method, independent of (but similar to) k-means;
I haven't looked closely at the documentation, so maybe I'm missing something, but isn't it because clustering takes place based on distances in Cartesian space, which can be calculated based on the points after PCA's dimensionality reduction? But clustering can also be done without this pre-processing step.
I did not use the Voronoi method from SciPy library. Instead, I changed the visualization method name from PCA to 2D-Scatter. Also a 3D-Scatter visualization method was added. In the documentation, we should mention that we perform PCA with 2 components before Voronoi and 2D-Scatter, as well as saying that we perform PCA with 3 components before 3D-Scatter.
I got a bit confused working on K-Means. In current master/live server, we define PCA and Voronoi to be two different visualizations of the K-Means clustering result. However, I think K-Means, PCA and Voronoi are three different things.