Open akol67 opened 1 year ago
If (ncol x nlin)/64 > 8 the kernel dies... 64= number of cpus Hmmm, curious let me try it.
@akol67 just to keep context, could you refer to the problem by linking message describing it in another issue or describe it here.
The jupyter notebook kernel dies when you choose an inappropriate parameterization. In this case, the size of the map, in case it is too big
Em qua., 30 de ago. de 2023 às 14:34, rafaelgioria @.***> escreveu:
@akol67 https://github.com/akol67 just to keep context, could you refer to the problem by linking message describing it in another issue or describe it here.
— Reply to this email directly, view it on GitHub https://github.com/InTRA-USP/IntraSOM/issues/13#issuecomment-1699584822, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3MX6RBJAPZF54CDH5ZMVDXX52UDANCNFSM6AAAAAA4E3NGZY . You are receiving this because you were mentioned.Message ID: @.***>
-- Att Alexandre Kolisnyk
I have got a dying kernel for a huge map 150 x 150 in colab. I have tried 100 x 100 too, and it died too in colab.
the error was running out of available RAM.
So, I think it is not dependent on the number of cores. the limit is the RAM to initialize the SOM training.
good test, you reproduced the error. Here I have 256Gb in the VDI. However, I have a way to run in a bigger one.. By the way, if you don't specify the map size do you think the internal calculation is good enough?
Em qua., 30 de ago. de 2023 às 15:10, rafaelgioria @.***> escreveu:
I have got a dying kernel for a huge map 150 x 150 in colab. I have tried 100 x 100 too, and it died too in colab.
the error was running out of available RAM.
So, I think it is not dependent on the number of cores. the limit is the RAM to initialize the SOM training.
— Reply to this email directly, view it on GitHub https://github.com/InTRA-USP/IntraSOM/issues/13#issuecomment-1699628986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3MX6TAI5GUI3K7JM77BVDXX562ZANCNFSM6AAAAAA4E3NGZY . You are receiving this because you were mentioned.Message ID: @.***>
-- Att Alexandre Kolisnyk
This snap is for 70 by 70 mapsize. Note a RAM peak for the initialization of SOM training. Data set is the animals of the tutorial.
I ran without specifying the mapsize and it flew. Internally calculated size must be beyond RAM capacity
Em qua., 30 de ago. de 2023 às 15:17, Alexandre Kolisnyk < @.***> escreveu:
good test, reproduced the error. Here I have 256Gb in the VDI. However, I have a way to run in a bigger one.. By the way, if you don't specify the map size do you think the internal calculation is good enough?
Em qua., 30 de ago. de 2023 às 15:10, rafaelgioria < @.***> escreveu:
I have got a dying kernel for a huge map 150 x 150 in colab. I have tried 100 x 100 too, and it died too in colab.
the error was running out of available RAM.
So, I think it is not dependent on the number of cores. the limit is the RAM to initialize the SOM training.
— Reply to this email directly, view it on GitHub https://github.com/InTRA-USP/IntraSOM/issues/13#issuecomment-1699628986, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3MX6TAI5GUI3K7JM77BVDXX562ZANCNFSM6AAAAAA4E3NGZY . You are receiving this because you were mentioned.Message ID: @.***>
-- Att Alexandre Kolisnyk
-- Att Alexandre Kolisnyk
automatic map size is compute using Vesanto et al (2000) heuristic relation: total number neuron is 5*sqrt(N), with N being the number of samples.
This should not be huge. It is around 27 x 27 mapsize for 20000 samples.
According my rule of thumb...27 x 27 / 64 = 11 ( >> 8 ) This is too high for my available RAM (256Gb) Kernel couldn't take it
Em qua., 30 de ago. de 2023 às 15:39, rafaelgioria @.***> escreveu:
automatic map size is compute using Vesanto et al (2000) heuristic relation: total number neuron is 5*sqrt(N), with N being the number of samples.
This should not be huge. It is around 27 x 27 mapsize for 20000 samples.
— Reply to this email directly, view it on GitHub https://github.com/InTRA-USP/IntraSOM/issues/13#issuecomment-1699662961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3MX6SRYFJQ7YEUUW4CZATXX6CGLANCNFSM6AAAAAA4E3NGZY . You are receiving this because you were mentioned.Message ID: @.***>
-- Att Alexandre Kolisnyk
Testing with a 1.4Tb machine. mapsize = (30,24)
It´s running...
Em qua., 30 de ago. de 2023 às 15:39, rafaelgioria @.***> escreveu:
automatic map size is compute using Vesanto et al (2000) heuristic relation: total number neuron is 5*sqrt(N), with N being the number of samples.
This should not be huge. It is around 27 x 27 mapsize for 20000 samples.
— Reply to this email directly, view it on GitHub https://github.com/InTRA-USP/IntraSOM/issues/13#issuecomment-1699662961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3MX6SRYFJQ7YEUUW4CZATXX6CGLANCNFSM6AAAAAA4E3NGZY . You are receiving this because you were mentioned.Message ID: @.***>
-- Att Alexandre Kolisnyk
Testing with a 1.4Tb machine. mapsize = (30,24)
It´s running...
Em qua., 30 de ago. de 2023 às 15:39, rafaelgioria @.***> escreveu:
automatic map size is compute using Vesanto et al (2000) heuristic relation: total number neuron is 5*sqrt(N), with N being the number of samples.
This should not be huge. It is around 27 x 27 mapsize for 20000 samples.
— Reply to this email directly, view it on GitHub https://github.com/InTRA-USP/IntraSOM/issues/13#issuecomment-1699662961, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3MX6SRYFJQ7YEUUW4CZATXX6CGLANCNFSM6AAAAAA4E3NGZY . You are receiving this because you were mentioned.Message ID: @.***>
-- Att Alexandre Kolisnyk
That's really odd. We do run locally with less RAM cases like this. We will keep investigating it.
Edit: that's really odd to require so much RAM for a small map. One can run a map bigger than this in free colab machines (RAM typically are 16GB or less).
I'll investigate and return ASAP. Usually very large maps require more RAM because of the euclidean distance matrix necessary to calculate the distances between the input vectors and the training neurons. I have been able to run up to 70x70 maps on 300.000 samples (5 features) on my personal computer (12Gb RAM). I'll look and try to reproduce the error. Thanks for the heads up.
@akol67 and @rodiegeology
You could profile the memory usage for each cell using jupyter cell-magic.
you need to install a memory profiler, for example (!
is needed to install in colab cells):
!pip install memory-profiler
and add the memory profiler to jupyter execution in cell:
%reload_ext memory_profiler
It should inform the peak memory usage and maybe help us address this issue.
To use it, one needs to add the cell-magic %%memit
in the first line of the jupyter cells used for the setup:
%%memit
mapsize = (80,80)
som_test = intrasom.SOMFactory.build(data,
mask=-9999,
mapsize=mapsize,
mapshape='toroid',
lattice='hexa',
normalization='var',
initialization='random',
neighborhood='gaussian',
training='batch',
name='Example',
component_names=None,
unit_names = None,
sample_names=None,
missing=False,
save_nan_hist = True,
pred_size=0)
I quote my output here for 153021 samples and 23 features:
Loading dataframe...
Normalizing data...
Creating neighborhood...
Initializing map...
Creating Neuron Distance Rows: 100%
80/80 [00:09<00:00, 7.22rows/s]
peak memory: 645.91 MiB, increment: 367.13 MiB
For the training cell:
%%memit
som_test.train(train_len_factor=5,
previous_epoch = False,
bootstrap=False,
)
with the ouput here
Starting Training...
Rough Training:
Epoch: 10. Radius:6.75. QE: 4.5431: 100%
10/10 [04:56<00:00, 28.29s/it]
Fine Tuning:
Epoch: 15. Radius:1.0. QE: 4.0931: 100%
15/15 [06:55<00:00, 27.86s/it]
Saving...
Training Report Created
Training completed successfully.
peak memory: 1626.34 MiB, increment: 980.27 MiB
I'm filling out a table containing peak memory values reached as the map size increases. 16x10 21x15 26x20 .....
Em ter., 5 de set. de 2023 às 13:59, rafaelgioria @.***> escreveu:
@akol67 https://github.com/akol67 and @rodiegeology https://github.com/rodiegeology
You could profile the memory usage for each cell using jupyter cell-magic.
you need to install a memory profiler, for example (!is needed to install in colab cells):
!pip install memory-profiler
and add the memory profiler to jupyter execution:
%reload_ext memory_profiler
To use it, one needs to add the cell-magic %%memit in the first line of the jupyter cells used for the setup:
%%memit
mapsize = (80,80) som_test = intrasom.SOMFactory.build(data, mask=-9999, mapsize=mapsize, mapshape='toroid', lattice='hexa', normalization='var', initialization='random', neighborhood='gaussian', training='batch', name='Example', component_names=None, unit_names = None, sample_names=None, missing=False, save_nan_hist = True, pred_size=0)
I quote my output here for 153021 samples and 23 features:
Loading dataframe... Normalizing data... Creating neighborhood... Initializing map... Creating Neuron Distance Rows: 100% 80/80 [00:09<00:00, 7.22rows/s] peak memory: 645.91 MiB, increment: 367.13 MiB
For the training cell:
%%memit som_test.train(train_len_factor=5, previous_epoch = False, bootstrap=False, )
with the ouput here
... # going to update this
It should inform the peak memory usage.
— Reply to this email directly, view it on GitHub https://github.com/InTRA-USP/IntraSOM/issues/13#issuecomment-1706982571, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3MX6RGQBRDKTWK64OYE6TXY5LA5ANCNFSM6AAAAAA4E3NGZY . You are receiving this because you were mentioned.Message ID: @.***>
-- Att Alexandre Kolisnyk
Note: memory footprint seems low, even for large sizes.Even so, I had to change machines (Vdi to Geocolab) to be able to run without the kernel crashing.
[image: image.png]
This still sounds like an issue with the VDI, and maybe the code.
Edit: @akol67, we haven't got the image you sent in discussion.
[image: image.png]
The jupyter notebook kernel dies when you choose an inappropriate parameterization. In this case, the size of the map, in case it is too big Em qua., 30 de ago. de 2023 às 14:34, rafaelgioria @.> escreveu: … @akol67 https://github.com/akol67 just to keep context, could you refer to the problem by linking message describing it in another issue or describe it here. — Reply to this email directly, view it on GitHub <#13 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AI3MX6RBJAPZF54CDH5ZMVDXX52UDANCNFSM6AAAAAA4E3NGZY . You are receiving this because you were mentioned.Message ID: @.> -- Att Alexandre Kolisnyk
It seems to me that when using the default mapsize (or little bigger) Kernel works fine whitout dying.
Placing the SOM inside a "for" loop was creating problems to me. After a few attempts Kernel died in the second or third iteration. The solution to run on the supercomputer (with 8 CPUs available) was to include the following lines in the code:
os.environ['OPENBLAS_NUM_THREADS'] = '1' os.environ['GOTO_NUM_THREADS'] = '1' os.environ['OMP_NUM_THREADS'] = '1'
Using n_job=8 in the som.train parameter
Now kernel is not dying anymore
I discovered a magic Number, by trial and error... Let mapsize = (ncol, nlin)
If (ncol x nlin)/64 > 8 the kernel dies...
64= number of cpus