djanloo / dynamiting

Repo for the Data Mining Fundamentals course
GNU General Public License v3.0
0 stars 0 forks source link

dimensionality reduction #2

Open djanloo opened 1 year ago

djanloo commented 1 year ago

thread dedicated to the knn task

djanloo commented 1 year ago

Early tests show a clear clustering over the sex parameter normalization: min_max_scaler knn: k = 10 resulting network density: 0.8 %

Clusters are so distant that the subnets are separated and the MDE algorithm pushes them away. image

djanloo commented 1 year ago

Also some kind of emotional clustering is present. Need to find optimal parameters to do the MDE, or at least understand how to treat subnets separately. image

djanloo commented 1 year ago

MDE with [1.0, 0.5, 0.1], [0.05, 0.01, 0.0], [10, 100, 300] over a 50nn network. Colors are emotions. image

MDE with [1.0, 0.5, 0.1], [0.05, 0.01, 0.0], [5, 10, 300] over the same 50nn network. Subnetwork treatment b=must be implemented, or at least the two subnets have to be linked together. image

djanloo commented 1 year ago

Spectral embedding shows clear advantages

[edit] in previous plots males and females were inverted, females show a same-emotion subclustering for low k

k = 500 for each subclass (m, f) image

k = 100 for each subclass image

k = 40 for each subclass image

djanloo commented 1 year ago

Emotional intensity completely breaks the knn inference

image image

djanloo commented 1 year ago

The idea of weighting a component of the vector made me discover the neighbourhood component analysis. It's always heartening when I reinvent the wheel.

djanloo commented 1 year ago

The results obtained so far are biased by my stupidity, obviously they show color ordering: I forget to exclude the emotion feature from the parameters of the knn.

djanloo commented 1 year ago

UMAP seems to say that, in addition to sex, another feature can be estimated using knn.

UMAP(15) over a MinMax scaled dataset (excluded: sex, actor, statement, emotion, emotional intensity) Color is sex. image Each subcluster manifests a division in sex areas

djanloo commented 1 year ago

Plot of the same UMAP(50) (local) embedding of the quantitative features only. Category is displayed by colour. image

The image shows intuitively that knn classification can be done only on sex, emotional_intensity and vocal_channel features.

I'm pretty disappointed by the emotion feature, since I thought it as the most deducible feature.

djanloo commented 1 year ago

UMAP(5) (super local) image

UMAP(100) (mid-global) Here the super-clusters seems to be connected, still no clue on which feature they are representing image

UMAP(200) (global) Globality seems to improve the representation image

djanloo commented 1 year ago

Scaler: MinMax

super-local image

local image

mid-global image

global image

djanloo commented 1 year ago

Scaler: Quantile super-local image

global image

djanloo commented 1 year ago

MDS shows the same results of UMAP

Scaler: QuantileTransform Embedding: mMDS

image

Scaler: QuantileTransform Embedding: nMDS

image

djanloo commented 1 year ago

Which feature is the bastard one? MAX norm makes clusterize in two leaves even for StandardScaler.

ISOMAP (5 neighbors) image

djanloo commented 1 year ago

It seems that sc_min and stft_min are the two feature that distinguish the two leaves: image7

Maybe this is due to some spurious effect of the non-analiticity of the min function.

djanloo commented 1 year ago

It seems that hal samples have a huge jump in sc_min image