HITS-AIN / PINK

Parallelized rotation and flipping INvariant Kohonen maps
GNU General Public License v3.0
21 stars 11 forks source link

--num-iter option ignored #44

Closed tjgalvin closed 4 years ago

tjgalvin commented 4 years ago

I am using the latest release of pink v2.4 git revision f385715.

It seems that whatever value I pass to --num-iter is ignored. The progress bar will include it in its computation but not when it comes to iterating over data items.

  Data file = TEST.bin
  Result file = SOM_B2.bin
  Number of data entries = 24853
  Data dimension = 100 x 100
  SOM dimension (width x height x depth) = 10x5x1
  SOM size = 50
  Number of iterations = 3
  Neuron dimension = 142x142
  Euclidean distance dimension = 70x70
  Data type for euclidean distance calculation = uint8
  Maximal number of progress information prints = 10
  Intermediate storage of SOM = off
  Layout = cartesian
  Initialization type = file_init
  SOM initialization file = SOM_B1.bin
  Interpolation type = bilinear
  Seed = 1234
  Number of rotations = 360
  Use mirrored image = 1
  Number of CPU threads = 1
  Use CUDA = 1
  Distribution function for SOM update = gaussian
  Sigma = 1
  Damping factor = 0.05
  Maximum distance for SOM update = -1
  Use periodic boundary conditions = 0
  Random shuffle data input = 0

  CUDA Device Query...
  There are 1 CUDA devices.

  CUDA Device #0
  Major revision number:         6
  Minor revision number:         0
  Name:                          Tesla P100-SXM2-16GB
  Total global memory:           17071734784
  Total shared memory per block: 49152
  Total registers per block:     65536
  Warp size:                     32
  Maximum memory pitch:          2147483647
  Maximum threads per block:     1024
  Maximum dimension 0 of block:  1024
  Maximum dimension 1 of block:  1024
  Maximum dimension 2 of block:  64
  Maximum dimension 0 of grid:   2147483647
  Maximum dimension 1 of grid:   65535
  Maximum dimension 2 of grid:   65535
  Clock rate:                    1480500
  Total constant memory:         65536
  Texture alignment:             512
  Concurrent copy and execution: Yes
  Number of multiprocessors:     56
  Kernel execution timeout:      No

[=======>                                                              ] 10 % 13.222 s
[=============>                                                        ] 19 % 26.454 s
[====================>                                                 ] 29 % 39.68 s
  Write final SOM to SOM_B2.bin ... done.

  Total time (hh:mm:ss): 00:00:44.277  
tjgalvin commented 4 years ago

Best I can tell from a quick read is that the DataIteratorShuffled does not accept the input_data.m_number_of_iterations attribute, where as the ProgressBar does. The main iterating for loop that handles iterating over the training data items

for (; iter_data_cur != iter_data_end; ++iter_data_cur, ++progress_bar)

will perform a single pass through the training image dataset. The same goes for the DataIterator. Perhaps we should edit the iterator classes to use the m_number_of_iterations appropriately and change the terminating condition in the above for loop to iter_data_cur.end_flag == true - seems like the data iteration classes already have the == operator overloaded.

BerndDoser commented 4 years ago

Thanks Tim for pointing this error out. I will fix it in release 2.4.

tjgalvin commented 4 years ago

Something else might be up with this. I was rereading the code for a separate issue and came across this line

for (uint32_t i = 0; i < input_data.m_number_of_iterations; ++i)

in which the iteration over the training data items is nested in. So, something else may be at play causing my weird issue? I am not 100% sure at the moment. This is withing the main_generic.h btw.

The other thing I noticed is that only the DataIteratorShuffled class is used when iterating throughout training.