GMPavanLab / cpctools

System analysis with soap fingerprints
MIT License
4 stars 6 forks source link

Mismatching in labels of clusters and transition matrix #83

Open martanit opened 1 year ago

martanit commented 1 year ago

Describe the bug

In SOAPify/Examples/LENS.ipynb, tmat labels are given incorrectly when the clusters assigned by KMeans are not in order (e.g.: [C0=0, C2=2, C1=1]). The output of calculateTransitionMatrix is a matrix with columns and rows corresponding to ordered clusters (e.g. for columns: C0=0 in col 0, C1=1 in col 1, C2=2 in col 2 ...) while the label assignment is given depending on the cluster order (C0 for col 0, C2 for col 1, C1 for col 2). The problem is fixed by sorting the labels, from:


classifications = SOAPclassification(
    [], prepareData(classifiedFilteredLENS), [f"C{m[0]}" for m in minmax]
)

to:


classifications = SOAPclassification(
    [], prepareData(classifiedFilteredLENS), [f"C{m[0]}" for m in np.sort(minmax, axis=0)]
)

To reproduce the bug, changing the random_state parameter in KMeans (and thus the cluster assignment order) changes the exchanging probabilities.

MikkelDA commented 1 year ago

Implementing this for me leads to an RGB conversion error ("Invalid RGBA argument: 'C0.0'") due to the creation of floats as strings instead of integers, thus a rounding should be applied inside the string creation. Here are some examples of what i mean, with the last one containing my proposed correction.

# Currently
print([f"C{m[0]}" for m in minmax])
['C1', 'C2', 'C3', 'C0']

# Correction proposed by martanit
print([f"C{m[0]}" for m in np.sort(minmax, axis=0)])
['C0.0', 'C1.0', 'C2.0', 'C3.0']

# My change to proposed correction
print([f"C{round(m[0])}" for m in np.sort(minmax, axis=0)])
['C0', 'C1', 'C2', 'C3']

Thus the change should be to:

classifications = SOAPclassification(
    [], prepareData(classifiedFilteredLENS), [f"C{round(m[0])}" for m in np.sort(minmax, axis=0)]
)
martanit commented 1 year ago

Yes, @MikkelDA you are right, I also had that issue and forgot to add the round. Thanks!