Is there a difference between the PINK v0.23 and v2 neuron update rule?

RafaelMostert commented 5 years ago

I am trying to replicate results that I get with PINK v0.23.

After training with the same parameters, I get a different result with PINK v2. It seems harder to get a low AQE and TE with the same parameters. Did something (even some constant like the sqrt after mapping) change for the update rule from v0.23 to v2.1? Something in the neighbourhood function (maybe it got normalized in v2?) or something in the learning rate parameter?

The only notable difference is that the neuron size is now bigger, but the euclidean distance dimension of v2 is equal to the euclidean distance dimension used in v0.23.

v0.23:

  Number of images = 3123
  Number of channels = 1 
  Image dimension = 165x165
  SOM dimension (width x height x depth) = 10x10x1
  SOM size = 100 
  Number of iterations = 1 
  Neuron dimension = 116x116
  Layout = quadratic
  Initialization type = random
  Interpolation type = bilinear
  Seed = 42
  Number of rotations = 360 
  Use mirrored image = 1 
  Number of CPU threads = 1 
  Use CUDA = 1 
  Use multiple GPUs = 1 
  Distribution function for SOM update = gaussian
  Sigma = 5 
  Damping factor = 12.5331
  Maximum distance for SOM update = -1
  Use periodic boundary conditions = 0

v2.1

  Number of data entries = 3123
  Data dimension = 165 x 165 
  SOM dimension (width x height x depth) = 10x10x1
  SOM size = 100 
  Number of iterations = 1 
  Neuron dimension = 234x234
  Euclidean distance dimension = 116x116
  Layout = cartesian
  Initialization type = random
  Interpolation type = bilinear
  Seed = 42
  Number of rotations = 360 
  Use mirrored image = 1 
  Number of CPU threads = 40
  Use CUDA = 1 
  Distribution function for SOM update = gaussian
  Sigma = 5 
  Damping factor = 12.5331
  Maximum distance for SOM update = -1
  Use periodic boundary conditions = 0 
  Store best rotation and flipping parameters = 0

After a few rounds with ever decreasing GAUSS and SIGMA I get the following values: v0.23 AQEs: [91.0, 102.0, 89.0, 70.0, 64.0, 59.0, 55.0, 51.0, 48.0, 46.0, 44.0, 42.0, 40.0, 39.0, 38.0, 37.0, 37.0, 36.0, 36.0, 36.0, 36.0, 35.0, 35.0, 35.0, 35.0, 35.0, 35.0] v2.1 AQEs: [95.0, 96.0, 80.0, 79.0, 76.0, 73.0, 74.0, 70.0, 69.0, 67.0, 66.0, 66.0, 65.0, 64.0, 64.0, 63.0, 63.0, 63.0, 63.0, 63.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0]

v0.23 TEs: [7, 27, 23, 11, 8, 4, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2] v2.1 TEs: [36, 49, 37, 32, 24, 28, 18, 17, 12, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9]

or as picture: training_v0 23 training_v2 1 (the value used to divide the AQEs in the picture is the same for both: the euclidean distance dimension.)

BerndDoser commented 5 years ago

Sorry, I don't know the quantities AQE an TE. Can you explain it or give me a link. Could you please also try to set the neuron size to the old value (116) and use --euclidean_distance_type float?

RafaelMostert commented 5 years ago

Average Quantization Error (AQE) is the sum of the summed euclidean distance of each image to its best matching neuron. Lower is better AQE = np.mean(np.min(mapping_data, axis=1)) (In the plots above I divided the AQEs at each epoch by the size of euclidean distance dimension.)

Topological Error (TE) is the percentage of images for which the second best matching neuron is not a direct neighbour of the best matching neuron. Lower is better.

the --euclidean_distance_type float flag is already enabled. I will try again with the neuron size set to 116.

RafaelMostert commented 5 years ago

Now with --neuron-dimension 116, almost the same result as without the neuron dimension flag and still very different from v0.23:

v0.23 AQEs: [91.0, 102.0, 89.0, 70.0, 64.0, 59.0, 55.0, 51.0, 48.0, 46.0, 44.0, 42.0, 40.0, 39.0, 38.0, 37.0, 37.0, 36.0, 36.0, 36.0, 36.0, 35.0, 35.0, 35.0, 35.0, 35.0, 35.0]
v2.1 AQEs: [95.0, 96.0, 80.0, 79.0, 76.0, 73.0, 74.0, 70.0, 69.0, 67.0, 66.0, 66.0, 65.0, 64.0, 64.0, 63.0, 63.0, 63.0, 63.0, 63.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0]
v2.1 n116: [94.0, 87.0, 81.0, 75.0, 74.0, 73.0, 71.0, 70.0, 68.0, 67.0, 66.0, 65.0, 64.0, 64.0, 63.0, 63.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0]

v0.23 TEs: [7, 27, 23, 11, 8, 4, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2]
v2.1 TEs: [36, 49, 37, 32, 24, 28, 18, 17, 12, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9]
n116 TEs:  [34, 25, 28, 34, 31, 26, 23, 17, 13, 11, 11, 10, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8]

RafaelMostert commented 5 years ago

If the train parameters are the same and the initialization and the image ordering too, it should give exactly the same results. Is there a quick hack for me to disable the random shuffle both in training and mapping?

Options that I can think of are:

different random initialization, but I don't think that will have such a strong effect. (In v0.23 different initialization seeds give at most a few percentpoints difference in both AQE and TE.)
the neighbourhood function. But I checked the implementation of the gaussian * damping and they are the same for both versions.
the distance calculation of the distance between neurons. Something might have changed here. For example src/SelfOrganizingMapLib/generate_euclidean_distance_matrix.h seems not be using euclidean distance, but euclidean distance squared? Also in src/CudaLib/euclidean_distance_kernel.h I do not see a square root. Although those might only be related to the euclidean distance between a neuron and an image ( #32 ) while I am thinking of the spatial separation between two neurons that is then passed to the neighbourhood-function.

BerndDoser commented 5 years ago

The mapping is now using a consecutive data order. For the training I have to add an input flag to switch off the random shuffle.

The random number generator for the SOM initialization has changed from (v1) https://github.com/HITS-AIN/PINK/blob/4b636b55e9cfd675737e525ba059c6c1f7c95be4/src/UtilitiesLib/Filler.h#L21 into (v2) https://github.com/HITS-AIN/PINK/blob/a0385d51d4c7bcb4577d856e24150fa77f2d5c66/src/UtilitiesLib/Filler.h#L20

As mentioned in #32 the sqrt is missing for the euclidean distance, therefore I call it squared_euclidean_distance. This is the same in both versions. I only have changed it now for the mapping to be consistent with the paper. To find the best matching neuron during training, the sqrt has no effect.

The neuron layout distance calculation has not changed.

RafaelMostert commented 5 years ago

To isolate the issue, I created a test to verify if the results are different. They do indeed seem to be.

See the python file in the zip: test_file_differ.zip

I created a 2x2 SOM containing 1s, 2s, 3s, 4s for the neurons and mapped 4 images to them, the first image filled with 101s, the other three images filled with all 1s.

I tried three different things:

a SOM and data binary version 2 style using Pinkv2.2 Git revision: 1113763 with default version 2 neuron-size.
a SOM and data binary version 2 style using Pinkv2.2 Git revision: 1113763 with neuron-size equal to the euclidean distance dimension (as in default version 1).
a SOM and data binary version 1 style using Pinkv0.23 with default version 1 neuron-size.

With the square root default size (v2.2) I expect:

200 198 196 194
0 2 4 6
0 2 4 6
0 2 4 6

without the square root (v0.23), I expect the outcome

40000 39204 38416 37636
0 4 16 36
0 4 16 36
0 4 16 36

I get the following mapping distances:

Map result (v2.2):
[[198.04  202.    202.    202.   ]
 [  4.472   2.      2.      2.   ]
 [  4.472   2.      2.      2.   ]
 [  4.472   2.      2.      2.   ]]

Map result (v2.2 small neurons):
[[2. 4. 6. 8.]
 [0. 2. 4. 6.]
 [0. 2. 4. 6.]
 [0. 2. 4. 6.]]

Map result (v0.23):
[[39999.992 39203.992 38415.992 37635.992]
 [    0.        4.       16.       36.   ]
 [    0.        4.       16.       36.   ]
 [    0.        4.       16.       36.   ]]

Apart from some rounding errors, v0.23 works as expected. I do not understand what happens with v2.2. For v2.2 with small neurons, the difference between the last three images and the neurons work as expected, while the difference between the first image and the neurons is weird.

RafaelMostert commented 5 years ago

Disregarding the exact numbers, if I only take a look at the ordering (0=best match, then increasing number with increasing euclidean distance) I expect this:

[3 2 1 0]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]

The results are like this for the example above:

(v2.2)
[0 1 2 3]
[3 0 1 2]
[3 0 1 2]
[3 0 1 2]

(v2.2 small neurons)
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]

(v0.23)
[3 2 1 0]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]

Repeating the experiment above but with larger images this time (100x100 instead of 4x4) I get a different, correct ordering for Pinkv2.2 with neuron-size equal to the euclidean distance dimension. (But the actual mapping values reported related to the 101s image are still wrong):

(v2.2)
[0 1 2 3]
[3 0 1 2]
[3 0 1 2]
[3 0 1 2]

(v2.2 small neurons)
[3 2 1 0]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]

(v0.23)
[3 2 1 0]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]

BerndDoser commented 5 years ago

Sorry Rafael. I am a bit busy at the moment with an other project. I come back to your problem as soon as possible.

RafaelMostert commented 5 years ago

Ok, thanks for letting me know and thanks for helping out so far!

RafaelMostert commented 5 years ago

For when you come back, I checked some more and found some clues that might help:

if the difference between input image and SOM neuron is smaller than four, (v2.2 small neurons) gives the correct outcome. If the difference is bigger than four it starts failing.
the number of rotations influences the result for (v2.2 small neurons): --numrot 4 does not give the same results as --numrot 40 or --numrot 360. At the moment there is no option to turn rotations off, see #36 So I was not able to test if the results are correct if rotations are turned off.

RafaelMostert commented 5 years ago

I just noticed that I made a mistake in creating the same SOM for (v2.2) and (v2.2 small neurons). Sorry for that! (v2.2) now does give correct results for all sizes and differences between neuron and image. (v2.2 small neurons) gives correct results for all sizes as long as the difference between neuron and image <5.

I will double check if I do something wrong with the dimensions there.

RafaelMostert commented 5 years ago

I don't know, I tested more and more cases and there is still something funky going on for (v2.2) and (v2.2 small neurons). For some cases the values returned are correct for both (v2.2) and (v2.2 small neurons). For some cases both are wrong and different. For some cases only one of the two is wrong.

I can not find a system in what is going on....

BerndDoser commented 5 years ago

Can you give me a single and small setting for which you see a difference between old and new version. Then I can go throw the debugger and track the values. I am a bit lost in your collection :)

RafaelMostert commented 5 years ago

Haha, yes I lost track myself as well :P

The settings for which I have a different result is when --neuron-dimension == --euclidean-distance-dimension. If the difference between between neuron and image >5.

So for example:

CUDA_VISIBLE_DEVICES=1 Pink --euclidean-distance-type float --som-width 2 --som-height 2 --neuron-dimension 2 --euclidean-distance-dimension 2  --map /data/single_image.bin /data/map_single_image_to_single_neuronv2.bin /data/single_neuron_somv2.bin

*************************************************************************
  *                                                                       *
  *                    PPPPP    II   NN    NN   KK  KK                    *
  *                    PP  PP   II   NNN   NN   KK KK                     *
  *                    PPPPP    II   NN NN NN   KKKK                      *
  *                    PP       II   NN   NNN   KK KK                     *
  *                    PP       II   NN    NN   KK  KK                    *
  *                                                                       *
  *       Parallelized rotation and flipping INvariant Kohonen maps       *
  *                                                                       *
  *                         Version 2.2                                   *
  *                         Git revision: 1113763                         *
  *                                                                       *
  *       Bernd Doser <bernd.doser@h-its.org>                             *
  *       Kai Polsterer <kai.polsterer@h-its.org>                         *
  *                                                                       *
  *       Distributed under the GNU GPLv3 License.                        *
  *       See accompanying file LICENSE or                                *
  *       copy at http://www.gnu.org/licenses/gpl-3.0.html.               *
  *                                                                       *
  *************************************************************************

  Data file = /data/single_image.bin
  Result file = /data/map_single_image_to_single_neuronv2.bin
  SOM file = /data/single_neuron_somv2.bin
  Number of data entries = 4
  Data dimension = 4 x 4
  SOM dimension (width x height x depth) = 2x2x1
  SOM size = 4
  Number of iterations = 1
  Neuron dimension = 2x2
  Euclidean distance dimension = 2x2
  Number of progress information prints = 10
  Intermediate storage of SOM = off
  Layout = cartesian
  Initialization type = file_init
  SOM initialization file = /data/single_neuron_somv2.bin
  Interpolation type = bilinear
  Seed = 1234
  Number of rotations = 360
  Use mirrored image = 1
  Number of CPU threads = 40
  Use CUDA = 1
  Store best rotation and flipping parameters = 0

[======================================================================] 100 % 0.002 s

  Total time (hh:mm:ss): 00:00:00.770     (0 s)

  Successfully finished. Have a nice day.

BerndDoser commented 5 years ago

I have fixed a bug in the cuda resize kernel. Was the issue only using GPU?

RafaelMostert commented 5 years ago

Great! The issue was indeed only using GPU.

RafaelMostert commented 5 years ago

Rebuild to Version 2.2 Git revision: c55d0d8. Results are now correct for both GPU and CPU.

BerndDoser commented 5 years ago

It was a stupid copy-and-paste error using division instead of multiplication to get the margin. This kernel is only used for the first unrotated image, whereas for the others the rotation and resize is in one step. Good to have this fixed.

Thank you very much for your excellent work!

HITS-AIN / PINK

Is there a difference between the PINK v0.23 and v2 neuron update rule? #33