Open djanloo opened 1 year ago
Early tests show a clear clustering over the sex parameter normalization: min_max_scaler knn: k = 10 resulting network density: 0.8 %
Clusters are so distant that the subnets are separated and the MDE algorithm pushes them away.
Also some kind of emotional clustering is present.
Need to find optimal parameters to do the MDE, or at least understand how to treat subnets separately.
MDE with [1.0, 0.5, 0.1], [0.05, 0.01, 0.0], [10, 100, 300] over a 50nn network.
Colors are emotions.
MDE with [1.0, 0.5, 0.1], [0.05, 0.01, 0.0], [5, 10, 300] over the same 50nn network.
Subnetwork treatment b=must be implemented, or at least the two subnets have to be linked together.
Spectral embedding shows clear advantages
[edit] in previous plots males and females were inverted, females show a same-emotion subclustering for low k
k = 500 for each subclass (m, f)
k = 100 for each subclass
k = 40 for each subclass
Emotional intensity completely breaks the knn inference
The idea of weighting a component of the vector made me discover the neighbourhood component analysis. It's always heartening when I reinvent the wheel.
The results obtained so far are biased by my stupidity, obviously they show color ordering: I forget to exclude the emotion feature from the parameters of the knn.
UMAP seems to say that, in addition to sex, another feature can be estimated using knn.
UMAP(15) over a MinMax scaled dataset (excluded: sex, actor, statement, emotion, emotional intensity)
Color is sex.
Each subcluster manifests a division in sex areas
Plot of the same UMAP(50) (local
) embedding of the quantitative features only. Category is displayed by colour.
The image shows intuitively that knn classification can be done only on sex
, emotional_intensity
and vocal_channel
features.
I'm pretty disappointed by the emotion
feature, since I thought it as the most deducible feature.
UMAP(5) (super local
)
UMAP(100) (mid-global
)
Here the super-clusters seems to be connected, still no clue on which feature they are representing
UMAP(200) (global
)
Globality seems to improve the representation
Scaler: MinMax
super-local
local
mid-global
global
Scaler: Quantile
super-local
global
MDS shows the same results of UMAP
Scaler: QuantileTransform Embedding: mMDS
Scaler: QuantileTransform Embedding: nMDS
Which feature is the bastard one? MAX norm makes clusterize in two leaves even for StandardScaler.
ISOMAP (5 neighbors)
It seems that sc_min
and stft_min
are the two feature that distinguish the two leaves:
7
Maybe this is due to some spurious effect of the non-analiticity of the min
function.
It seems that hal samples have a huge jump in sc_min
thread dedicated to the knn task