UoB-DSMP-2023-24 / dsmp-2024-group-7

dsmp-2024-group-7 created by GitHub Classroom
0 stars 0 forks source link

Task 4: reduce dimension and visualize the result #7

Open YOHOHO111 opened 6 months ago

YOHOHO111 commented 6 months ago

use PCA to lower the dimension of distance matrix got from task3, and visualize it(maybe scatter plot or heat map), and also try to use another method to lower the dimension since PCA is quite simple compared with TSNE or UMAP

YOHOHO111 commented 6 months ago

Since the distance matrix is in high dimension, so we need to process the data by lower its dimension, TruncatedSVD is more simple compared with other method like tsne or umap, but it's faster , so i decide to use both method, by using TruncatedSVD at first to lower the data's dimension to 50, and then use tsne and umap to lower the dimension to 2 continuely, and use scatter plot to visualize it. With comparison and base on the feature of data, using tsne can cluster the data more clearly. For detail, see in the mini project.ipynb file in Gary's branch

YOHOHO111 commented 6 months ago

add PCA comparison part and also using seaborn libary instead to make the graph looks more nice and base on specificity,for detail, see .ipynb file in gary's branch