dimensionality reduction

djanloo / dynamiting

Repo for the Data Mining Fundamentals course

GNU General Public License v3.0

0 stars 0 forks source link

dimensionality reduction #2

Open djanloo opened 1 year ago

djanloo commented 1 year ago

thread dedicated to the knn task

djanloo commented 1 year ago

Early tests show a clear clustering over the sex parameter normalization: min_max_scaler knn: k = 10 resulting network density: 0.8 %

Clusters are so distant that the subnets are separated and the MDE algorithm pushes them away.

djanloo commented 1 year ago

Also some kind of emotional clustering is present. Need to find optimal parameters to do the MDE, or at least understand how to treat subnets separately.

djanloo commented 1 year ago

MDE with [1.0, 0.5, 0.1], [0.05, 0.01, 0.0], [10, 100, 300] over a 50nn network. Colors are emotions.

MDE with [1.0, 0.5, 0.1], [0.05, 0.01, 0.0], [5, 10, 300] over the same 50nn network. Subnetwork treatment b=must be implemented, or at least the two subnets have to be linked together.

djanloo commented 1 year ago

Spectral embedding shows clear advantages

[edit] in previous plots males and females were inverted, females show a same-emotion subclustering for low k

k = 500 for each subclass (m, f)

k = 100 for each subclass

k = 40 for each subclass

djanloo commented 1 year ago

Emotional intensity completely breaks the knn inference

djanloo commented 1 year ago

The idea of weighting a component of the vector made me discover the neighbourhood component analysis. It's always heartening when I reinvent the wheel.

djanloo commented 1 year ago

The results obtained so far are biased by my stupidity, obviously they show color ordering: I forget to exclude the emotion feature from the parameters of the knn.

djanloo commented 1 year ago

UMAP seems to say that, in addition to sex, another feature can be estimated using knn.

UMAP(15) over a MinMax scaled dataset (excluded: sex, actor, statement, emotion, emotional intensity) Color is sex. Each subcluster manifests a division in sex areas

djanloo commented 1 year ago

Plot of the same UMAP(50) (local) embedding of the quantitative features only. Category is displayed by colour.

The image shows intuitively that knn classification can be done only on sex, emotional_intensity and vocal_channel features.

I'm pretty disappointed by the emotion feature, since I thought it as the most deducible feature.

djanloo commented 1 year ago

UMAP(5) (super local)

UMAP(100) (mid-global) Here the super-clusters seems to be connected, still no clue on which feature they are representing

UMAP(200) (global) Globality seems to improve the representation

djanloo commented 1 year ago

Scaler: MinMax

super-local

local

mid-global

global

djanloo commented 1 year ago

Scaler: Quantile super-local

global

djanloo commented 1 year ago

MDS shows the same results of UMAP

Scaler: QuantileTransform Embedding: mMDS

Scaler: QuantileTransform Embedding: nMDS

djanloo commented 1 year ago

Which feature is the bastard one? MAX norm makes clusterize in two leaves even for StandardScaler.

ISOMAP (5 neighbors)

djanloo commented 1 year ago

It seems that sc_min and stft_min are the two feature that distinguish the two leaves: 7

Maybe this is due to some spurious effect of the non-analiticity of the min function.

djanloo commented 1 year ago

It seems that hal samples have a huge jump in sc_min