HITS-AIN / PINK

Parallelized rotation and flipping INvariant Kohonen maps
GNU General Public License v3.0
21 stars 11 forks source link

Ordering in mapping binary differs from ordering in data binary #34

Closed RafaelMostert closed 5 years ago

RafaelMostert commented 5 years ago

On a per image basis, the mapping to the SOM works fine. But the ordering returned by the mapping binary seems different from the ordering in the data binary. As if there is some shuffle taking place of the images or the mapping results, at some point in the PINK code.

Here is the test: This time I have a 2x2 (=4 neurons) test SOM. A neuron consisting of only 1s in the top-left, a neuron with only 2s in the top-right, a neuron with only 3s in the bottom-left and a neuron with only 4s in the bottom-right.

The data binary consists of 4 images, all ones, except the first image which contains all 100s. But in the mapping binary it returns as second item in the list..

Here is the code:

# Create sample SOM
# Map a the same image to this sample SOM
# see if the result is 0 (or very close to zero indeed)

def create_test_som(som_path, som_data):
    """A test SOM, cartesian layout, neuron dimensionality = 2"""

    # Show single neuron
    print('som_data shape:', np.shape(som_data))
    print(som_data)
    # Write SOM binary file
    assert som_path.endswith('.bin')
    with open(som_path,'wb') as file:
        # <file format version> 1 <data-type> <som layout> <neuron layout> <data>
        file.write(struct.pack('i' * 11, 2, #<file format version>
                               1, #<file type>
                               0, #<data type>
                               0, #<som layout>
                               2, 2,2, #<som dimensionality, width and height> 
                               0, #<neuron layout>
                               2, np.shape(som_data)[1], np.shape(som_data)[2])) #<neuron dimensionality and width> 

        file.write(struct.pack('f' * som_data.size, *som_data.flatten()))

def create_test_data(data_path, image_data):
    """A set of test images, cartesian layout, dimensionality = 2"""

    # Show single image (identical to the single SOM neuron)
    print('image_data shape:', np.shape(image_data))
    print(image_data)

    # Write SOM binary file
    assert data_path.endswith('.bin')
    with open(data_path,'wb') as file:
    # <file format version> 0 <data-type> <number of entries> <data layout> <data>
        file.write(struct.pack('i' * 8, 2, 0, 0, len(image_data), 0, 2, np.shape(image_data)[1], np.shape(image_data)[2]))
        file.write(struct.pack('f' * image_data.size, *image_data.flatten()))

def map_test_data_to_test_som(gpu_id, som_width, som_height, data_path, som_path, map_path):
    print(f"CUDA_VISIBLE_DEVICES={gpu_id} Pink  -p 1 --euclidean-distance-type float --som-width {som_width} --som-height {som_height}"
          f" --map {data_path} {map_path} {som_path}")

def load_som_mapping(map_path, verbose=True):
    # Create list of indexes to retrieve coordinates of the cut-outs
    cut_out_index = []
    with open(map_path, 'rb') as file:

        # <file format version> 2 <data-type> <number of entries> <som layout> <data>
        version, file_type, data_type, numberOfImages, som_layout, som_dimensionality = struct.unpack(
            'i' * 6, file.read(4 * 6)) 
        som_dimensions = struct.unpack('i' * som_dimensionality, file.read(4 *
             som_dimensionality))
        if verbose:

            print('version:', version)
            print('file_type:', file_type)
            print('data_type:', data_type)
            print('numberOfImages:', numberOfImages)
            print('som_layout:', som_layout)
            print('som dimensionality:', som_dimensionality)
            print('som dimensions:', som_dimensions)

        som_width = som_dimensions[0]
        som_height = som_dimensions[1] if som_dimensionality > 1 else 1
        som_depth = som_dimensions[2] if som_dimensionality > 2 else 1
        maps = []
        assert file_type == 2 # 2 equals mapping

        map_size = som_width * som_height * som_depth

        data_map = np.ones((numberOfImages, map_size))
        for i in range(numberOfImages):
            for t in range(map_size):
                data_map[i,t] = struct.unpack_from("f", file.read(4))[0]

    print('\nMap result:')
    print(data_map, numberOfImages, som_width, som_height, som_depth,'\n')
    return data_map, numberOfImages, som_width, som_height, som_depth

def show_mapping(data_map):
    for im_map in data_map:    
        print(np.argsort(np.argsort(im_map)))#.reshape([som_width,som_width]))

gpu_id = 1 
som_width=2
som_height=2
image_width=2
np.random.seed=42
som_width=2
neuron_width = int(np.ceil(image_width/np.sqrt(2)*2))
som_path = '/data/test_som.bin'
data_path = '/data/test_images.bin'
map_path = '/data/map_images_to_test_som.bin'

# creating the neurons
t = 0
neurons = [[[]  for j in range(som_width)] for i in range(som_width)]
for j in range(som_width):
    for i in range(som_width):
        t +=1
        neurons[j][i] = t*np.ones([neuron_width,neuron_width]) 
neurons = np.array(neurons)

image = np.array([100*np.ones([image_width,image_width]),np.ones([image_width,image_width]),
                  np.ones([image_width,image_width]),np.ones([image_width,image_width])])

create_test_som(som_path, neurons)
create_test_data(data_path, image)
map_test_data_to_test_som(gpu_id, som_width,som_height,data_path, som_path, map_path)

Creating this input:

som_data shape: (2, 2, 3, 3)
[[[[1. 1. 1.]
   [1. 1. 1.]
   [1. 1. 1.]]

  [[2. 2. 2.]
   [2. 2. 2.]
   [2. 2. 2.]]]

 [[[3. 3. 3.]
   [3. 3. 3.]
   [3. 3. 3.]]

  [[4. 4. 4.]
   [4. 4. 4.]
   [4. 4. 4.]]]]
image_data shape: (4, 2, 2)
[[[100. 100.]
  [100. 100.]]

 [[  1.   1.]
  [  1.   1.]]

 [[  1.   1.]
  [  1.   1.]]

 [[  1.   1.]
  [  1.   1.]]]

Running Pink:

Number of data entries = 4
  Data dimension = 2 x 2
  SOM dimension (width x height x depth) = 2x2x1
  SOM size = 4
  Number of iterations = 1
  Neuron dimension = 3x3
  Euclidean distance dimension = 1x1
  Number of progress information prints = 1
  Intermediate storage of SOM = off
  Layout = cartesian
  Initialization type = file_init
  SOM initialization file = /data/test_som.bin
  Interpolation type = bilinear
  Seed = 1234
  Number of rotations = 360
  Use mirrored image = 1
  Number of CPU threads = 40
  Use CUDA = 1
  Distribution function for SOM update = gaussian
  Sigma = 1.1
  Damping factor = 0.2
  Maximum distance for SOM update = -1
  Use periodic boundary conditions = 0
  Store best rotation and flipping parameters = 0

Loading the mapping:

data_map, numberOfImages, som_width, som_height, som_depth = load_som_mapping(map_path, verbose=True);
show_mapping(data_map)

Result:

version: 2
file_type: 2
data_type: 0
numberOfImages: 4
som_layout: 0
som dimensionality: 2
som dimensions: (2, 2)

Map result:
[[0.000e+00 1.000e+00 4.000e+00 9.000e+00]
 [9.801e+03 9.604e+03 9.409e+03 9.216e+03]
 [0.000e+00 1.000e+00 4.000e+00 9.000e+00]
 [0.000e+00 1.000e+00 4.000e+00 9.000e+00]] 4 2 2 1 

[0 1 2 3]
[3 2 1 0]
[0 1 2 3]
[0 1 2 3]

While I expect:

[3 2 1 0]
[0 1 2 3]
[0 1 2 3]
[0 1 2 3]

I tried mapping different sets of images and it is hard to find a pattern, thus I suspect a random shuffle somewhere... Maybe it is due to the shuffle in PINK/src/SelfOrganizingMapLib/DataIterator.h?

BerndDoser commented 5 years ago

Absolutely! The DataIterator is doing a random shuffle of the images. This is nonsense for the mapping, of course. I will switch the shuffle off for the mapping. Could this be also the reason for #33?

RafaelMostert commented 5 years ago

Ok. Good to know.

I don't think that this will also solve #33 . The measure AQE = np.mean(np.min(mapping_data, axis=1)) and the TE measure do not care about ordering. They work as long as the per image mapping is correct.