JustGlowing / minisom

:red_circle: MiniSom is a minimalistic implementation of the Self Organizing Maps
MIT License
1.43k stars 420 forks source link

Can i do SOM for multiple variables? #189

Open jiwon-j opened 3 months ago

jiwon-j commented 3 months ago

I have two datasets, geopotential height (GPH) and Precipitation. Each variable has the same time dimension (year). I want to do clustering for each year, and I was wondering if and how I can run SOM considering "both" variables (not doing SOM for each variable individually).

I tried with np.hstack, but this merges each variable array, so I'm not sure if it's accurate. (if GPH have (year:20, flatten_values:500) shape and Precipitation have (year:20, flatten_values:500), np.hstack made it into (year:20, flatten_values:10000, just attached it) I was wondering if this is even possible.

JustGlowing commented 3 months ago

hi @jiwon-j, SOM is a multivariate model and you can build your input as a matrix where each row corresponds to a year and contains values from all the variables that you have.

These numpy functions can help you reshape your original data:

jiwon-j commented 3 months ago

hi @jiwon-j, SOM is a multivariate model and you can build your input as a matrix where each row corresponds to a year and contains values from all the variables that you have.

These numpy functions can help you reshape your original data:

thank you! i made a combined array, but do the two variables in here have to have the same shape? Trying to run a SOM and getting broadcast issues

JustGlowing commented 3 months ago

the input matrix needs to have only 2 dimensions, which means that you have to concatenate your data on the appropriate axis.

vwgeiser commented 3 months ago

@jiwon-j I ran into a similar problem and flattening (vectorizing) the input data into a 1D vector is how I got mine to work. Then making your input_len the length of one sample. You can see the later parts of #187 where show how I do this. Unless you've found a way around it I would imagine that inputs need to be the same length (minisom requires a square matrix). Imputation may help with this?

Adds quite a bit of dimensionality but MiniSOM is able to handle this sort of data at the cost of dimensionality.