Closed aktaseren closed 3 years ago
hi @aktaseren, you have to try different combinations of parameters and pick the one that yields the best score.
Here's some pseudocode:
params = [{'x': 3, 'y':3, 'sigma':1.0}, {'x': 2, 'y':2, 'sigma':0.9}]
for p in params:
MiniSom(x=p['x'], y=p['y'], sigma=p['sigma])
# evaluate your error and save it
# pick the combination that minimize the error
You need to treat this as a proper grid search and, if your problem allows it, may want to consider using cross-validation for the estimation of your error.
Hi @JustGlowing thanks a lot for a quick reply. I actually applied Bayesian Optimization. I am getting nice quantization errors as tuned. However, the map output is completely dark. I guess that SOM is overfitted. My main question is that in what range the parameters of SOM should be? For example: sigma is generally taken 1 and learning rate is taken between 0 and 1. Do these parameters have any minimum or maximum threshold?
It's hard to give a range for the parameters as they depend on each others. For example, a small map (5x5) can work well with sigma in [1, 3], but with a bigger map you can increase sigma even higher. My suggestion is to find a set of parameters that gives a result that visually makes sense then vary them.
Thanks a lot for this.
Last questions: My dataset has 200k rows and 29 columns. One paper suggests that the number of map nodes can be decided over the calculation below:
#Defining 2-Dimensional map size
from math import sqrt
sqrt(5*sqrt(X_train.shape[0]))
Therefore, I set my nodes as x=43 and y=43 and my tuned parameters over these nodes as follows:
# Set Hyperparameters
x = 43
y = 43
input_len = 29
sigma = 1
learning_rate = 0.5
neighborhood_function = 'bubble'
iterations = 500
What do you think regarding the node calculation method? I actually have some doubts about this. However, only one paper is suggesting this without any concrete proof. Maybe this affects the quality of SOM map output.
That way to determine the size of the map is well known, but it's just a rule of thumb. The best size depends on how your data is distributed and you can find it trying different sizes and checking which one fits your data.
Thanks very much for this
You can now use this dashboard to explore the effects of the parameters on a sample dataset: https://share.streamlit.io/justglowing/minisom/dashboard/dashboard.py
I
have a SOM model below applied through an intrusion detection case (unsupervised case) whose dataset is quite big. I selected the suggested parameters based on a paper. However, I require to tune the model but there is no source on how to tune minisom. Can you please suggest me for tuning minisom?