UoB-DSMP-2023-24 / dsmp-2024-group-7

dsmp-2024-group-7 created by GitHub Classroom
0 stars 0 forks source link

Task 5: Try some cluster methods(K-means, spectral, DBSCAN, AHC) #11

Open KIEwx opened 5 months ago

KIEwx commented 5 months ago

For the distance sparse matrix obtained by TCRDist, I tried to reduce the dimension first and then cluster. For dimensionality reduction, I used the TruncatedSVD method, which first reduced the original sparse matrix to 2 dimensions and then used it for subsequent clustering. As clustering methods, I used four methods: K-means clustering, spectral clustering, DBSCAN, and agglomerative hierarchical clustering. The indicators to evaluate the clustering effect are Silhouette Score and Calinski-Harabasz index. At present, the effects of K-means clustering, DBSCAN, and agglomerative hierarchical clustering are acceptable, while K-means and agglomerative hierarchical clustering are better.

KIEwx commented 5 months ago

In the future, I will try to reduce the dimensionality through PCA first, and then reduce the dimensionality through TruncatedSVD, which can make the dimensionality reduction process more efficient. At the same time, the distance sparse matrix obtained by TCRDist is only based on a single alpha chain and beta chain. How to obtain a common sparse matrix for both has not yet been solved. ​

KIEwx commented 5 months ago

Spectral clustering (O(n^2)) and cohesive hierarchical clustering algorithms (O(n^3)) are not very feasible given the size of data in the dataset. I will be looking for other clustering methods besides K-means and DBSCAN for comparison experiments.