CRG-CNAG / robustica

customizable robust Independent Component Analysis (ICA)
BSD 3-Clause "New" or "Revised" License
11 stars 4 forks source link

Cluster with negative IDs #4

Closed BahmanTahayori closed 1 year ago

BahmanTahayori commented 1 year ago

I was not able to find a sufficient explanation of what a negative cluster ID means, e.g. -1. What I found was that it is generated for low quality index clusters. Moreover, it seems that it is an extra cluster. I assume that this cluster and the associated mixing matrix column should be ignored for proper use of the mixing matrix. Can robustica generate more than one cluster with a negative ID? Is there any document that provides more intuition to this issue?

MiqG commented 1 year ago

Hi @BahmanTahayori ,

-1 clusters correspond to the -1 cluster in DBSCAN clustering algorithm from scikit-learn. Therefore, I would ignore all components corresponding to this cluster as they were labeled as noise. Because it is using this algorithm for the clustering step it will not generate other negative IDs. Checkout scikit-learn's documentation for more info on the clustering algorithm and its outputs: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html

Cheers,

Miquel

BahmanTahayori commented 1 year ago

Hi @MiqG,

Thanks for the refernce and confirming that -1 clusters should be neglected. All good.