Closed RafaelMostert closed 5 years ago
Sorry, I don't know the quantities AQE an TE. Can you explain it or give me a link. Could you please also try to set the neuron size to the old value (116) and use --euclidean_distance_type float
?
Average Quantization Error (AQE) is the sum of the summed euclidean distance of each image to its best matching neuron. Lower is better
AQE = np.mean(np.min(mapping_data, axis=1))
(In the plots above I divided the AQEs at each epoch by the size of euclidean distance dimension.)
Topological Error (TE) is the percentage of images for which the second best matching neuron is not a direct neighbour of the best matching neuron. Lower is better.
the --euclidean_distance_type float
flag is already enabled. I will try again with the neuron size set to 116.
Now with --neuron-dimension 116
, almost the same result as without the neuron dimension flag and still very different from v0.23:
v0.23 AQEs: [91.0, 102.0, 89.0, 70.0, 64.0, 59.0, 55.0, 51.0, 48.0, 46.0, 44.0, 42.0, 40.0, 39.0, 38.0, 37.0, 37.0, 36.0, 36.0, 36.0, 36.0, 35.0, 35.0, 35.0, 35.0, 35.0, 35.0]
v2.1 AQEs: [95.0, 96.0, 80.0, 79.0, 76.0, 73.0, 74.0, 70.0, 69.0, 67.0, 66.0, 66.0, 65.0, 64.0, 64.0, 63.0, 63.0, 63.0, 63.0, 63.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0]
v2.1 n116: [94.0, 87.0, 81.0, 75.0, 74.0, 73.0, 71.0, 70.0, 68.0, 67.0, 66.0, 65.0, 64.0, 64.0, 63.0, 63.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0]
v0.23 TEs: [7, 27, 23, 11, 8, 4, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2]
v2.1 TEs: [36, 49, 37, 32, 24, 28, 18, 17, 12, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9]
n116 TEs: [34, 25, 28, 34, 31, 26, 23, 17, 13, 11, 11, 10, 9, 9, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8, 8]
If the train parameters are the same and the initialization and the image ordering too, it should give exactly the same results. Is there a quick hack for me to disable the random shuffle both in training and mapping?
Options that I can think of are:
src/SelfOrganizingMapLib/generate_euclidean_distance_matrix.h
seems not be using euclidean distance, but euclidean distance squared?
Also in src/CudaLib/euclidean_distance_kernel.h
I do not see a square root. Although those might only be related to the euclidean distance between a neuron and an image ( #32 ) while I am thinking of the spatial separation between two neurons that is then passed to the neighbourhood-function.The mapping is now using a consecutive data order. For the training I have to add an input flag to switch off the random shuffle.
The random number generator for the SOM initialization has changed from (v1) https://github.com/HITS-AIN/PINK/blob/4b636b55e9cfd675737e525ba059c6c1f7c95be4/src/UtilitiesLib/Filler.h#L21 into (v2) https://github.com/HITS-AIN/PINK/blob/a0385d51d4c7bcb4577d856e24150fa77f2d5c66/src/UtilitiesLib/Filler.h#L20
As mentioned in #32 the sqrt is missing for the euclidean distance, therefore I call it squared_euclidean_distance. This is the same in both versions. I only have changed it now for the mapping to be consistent with the paper. To find the best matching neuron during training, the sqrt has no effect.
The neuron layout distance calculation has not changed.
To isolate the issue, I created a test to verify if the results are different. They do indeed seem to be.
See the python file in the zip: test_file_differ.zip
I created a 2x2 SOM containing 1s, 2s, 3s, 4s for the neurons and mapped 4 images to them, the first image filled with 101s, the other three images filled with all 1s.
I tried three different things:
Git revision: 1113763
with default version 2 neuron-size.Git revision: 1113763
with neuron-size equal to the euclidean distance dimension (as in default version 1).With the square root default size (v2.2) I expect:
200 198 196 194
0 2 4 6
0 2 4 6
0 2 4 6
without the square root (v0.23), I expect the outcome
40000 39204 38416 37636
0 4 16 36
0 4 16 36
0 4 16 36
I get the following mapping distances:
Map result (v2.2):
[[198.04 202. 202. 202. ]
[ 4.472 2. 2. 2. ]
[ 4.472 2. 2. 2. ]
[ 4.472 2. 2. 2. ]]
Map result (v2.2 small neurons):
[[2. 4. 6. 8.]
[0. 2. 4. 6.]
[0. 2. 4. 6.]
[0. 2. 4. 6.]]
Map result (v0.23):
[[39999.992 39203.992 38415.992 37635.992]
[ 0. 4. 16. 36. ]
[ 0. 4. 16. 36. ]
[ 0. 4. 16. 36. ]]
Apart from some rounding errors, v0.23 works as expected. I do not understand what happens with v2.2. For v2.2 with small neurons, the difference between the last three images and the neurons work as expected, while the difference between the first image and the neurons is weird.
Disregarding the exact numbers, if I only take a look at the ordering (0=best match, then increasing number with increasing euclidean distance) I expect this:
[3 2 1 0]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
The results are like this for the example above:
(v2.2)
[0 1 2 3]
[3 0 1 2]
[3 0 1 2]
[3 0 1 2]
(v2.2 small neurons)
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
(v0.23)
[3 2 1 0]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
Repeating the experiment above but with larger images this time (100x100 instead of 4x4) I get a different, correct ordering for Pinkv2.2 with neuron-size equal to the euclidean distance dimension. (But the actual mapping values reported related to the 101s image are still wrong):
(v2.2)
[0 1 2 3]
[3 0 1 2]
[3 0 1 2]
[3 0 1 2]
(v2.2 small neurons)
[3 2 1 0]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
(v0.23)
[3 2 1 0]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]
Sorry Rafael. I am a bit busy at the moment with an other project. I come back to your problem as soon as possible.
Ok, thanks for letting me know and thanks for helping out so far!
For when you come back, I checked some more and found some clues that might help:
(v2.2 small neurons)
gives the correct outcome. If the difference is bigger than four it starts failing.(v2.2 small neurons)
: --numrot 4
does not give the same results as --numrot 40
or --numrot 360
.
At the moment there is no option to turn rotations off, see #36
So I was not able to test if the results are correct if rotations are turned off.I just noticed that I made a mistake in creating the same SOM for (v2.2) and (v2.2 small neurons). Sorry for that! (v2.2) now does give correct results for all sizes and differences between neuron and image. (v2.2 small neurons) gives correct results for all sizes as long as the difference between neuron and image <5.
I will double check if I do something wrong with the dimensions there.
I don't know, I tested more and more cases and there is still something funky going on for (v2.2) and (v2.2 small neurons). For some cases the values returned are correct for both (v2.2) and (v2.2 small neurons). For some cases both are wrong and different. For some cases only one of the two is wrong.
I can not find a system in what is going on....
Can you give me a single and small setting for which you see a difference between old and new version. Then I can go throw the debugger and track the values. I am a bit lost in your collection :)
Haha, yes I lost track myself as well :P
The settings for which I have a different result is when --neuron-dimension
== --euclidean-distance-dimension
.
If the difference between between neuron and image >5.
So for example:
CUDA_VISIBLE_DEVICES=1 Pink --euclidean-distance-type float --som-width 2 --som-height 2 --neuron-dimension 2 --euclidean-distance-dimension 2 --map /data/single_image.bin /data/map_single_image_to_single_neuronv2.bin /data/single_neuron_somv2.bin
*************************************************************************
* *
* PPPPP II NN NN KK KK *
* PP PP II NNN NN KK KK *
* PPPPP II NN NN NN KKKK *
* PP II NN NNN KK KK *
* PP II NN NN KK KK *
* *
* Parallelized rotation and flipping INvariant Kohonen maps *
* *
* Version 2.2 *
* Git revision: 1113763 *
* *
* Bernd Doser <bernd.doser@h-its.org> *
* Kai Polsterer <kai.polsterer@h-its.org> *
* *
* Distributed under the GNU GPLv3 License. *
* See accompanying file LICENSE or *
* copy at http://www.gnu.org/licenses/gpl-3.0.html. *
* *
*************************************************************************
Data file = /data/single_image.bin
Result file = /data/map_single_image_to_single_neuronv2.bin
SOM file = /data/single_neuron_somv2.bin
Number of data entries = 4
Data dimension = 4 x 4
SOM dimension (width x height x depth) = 2x2x1
SOM size = 4
Number of iterations = 1
Neuron dimension = 2x2
Euclidean distance dimension = 2x2
Number of progress information prints = 10
Intermediate storage of SOM = off
Layout = cartesian
Initialization type = file_init
SOM initialization file = /data/single_neuron_somv2.bin
Interpolation type = bilinear
Seed = 1234
Number of rotations = 360
Use mirrored image = 1
Number of CPU threads = 40
Use CUDA = 1
Store best rotation and flipping parameters = 0
[======================================================================] 100 % 0.002 s
Total time (hh:mm:ss): 00:00:00.770 (0 s)
Successfully finished. Have a nice day.
I have fixed a bug in the cuda resize kernel. Was the issue only using GPU?
Great! The issue was indeed only using GPU.
Rebuild to Version 2.2 Git revision: c55d0d8
.
Results are now correct for both GPU and CPU.
It was a stupid copy-and-paste error using division instead of multiplication to get the margin. This kernel is only used for the first unrotated image, whereas for the others the rotation and resize is in one step. Good to have this fixed.
Thank you very much for your excellent work!
I am trying to replicate results that I get with PINK v0.23.
After training with the same parameters, I get a different result with PINK v2. It seems harder to get a low AQE and TE with the same parameters. Did something (even some constant like the sqrt after mapping) change for the update rule from v0.23 to v2.1? Something in the neighbourhood function (maybe it got normalized in v2?) or something in the learning rate parameter?
The only notable difference is that the neuron size is now bigger, but the euclidean distance dimension of v2 is equal to the euclidean distance dimension used in v0.23.
v0.23:
v2.1
After a few rounds with ever decreasing GAUSS and SIGMA I get the following values: v0.23 AQEs: [91.0, 102.0, 89.0, 70.0, 64.0, 59.0, 55.0, 51.0, 48.0, 46.0, 44.0, 42.0, 40.0, 39.0, 38.0, 37.0, 37.0, 36.0, 36.0, 36.0, 36.0, 35.0, 35.0, 35.0, 35.0, 35.0, 35.0] v2.1 AQEs: [95.0, 96.0, 80.0, 79.0, 76.0, 73.0, 74.0, 70.0, 69.0, 67.0, 66.0, 66.0, 65.0, 64.0, 64.0, 63.0, 63.0, 63.0, 63.0, 63.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0, 62.0]
v0.23 TEs: [7, 27, 23, 11, 8, 4, 3, 3, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2] v2.1 TEs: [36, 49, 37, 32, 24, 28, 18, 17, 12, 11, 11, 10, 10, 9, 9, 9, 8, 8, 8, 8, 9, 9, 9, 9, 9, 9, 9]
or as picture: (the value used to divide the AQEs in the picture is the same for both: the euclidean distance dimension.)