dlaptev / TI-pooling

TI-pooling: transformation-invariant pooling for feature learning in Convolutional Neural Networks
Other
118 stars 18 forks source link

Can TI-pooling learn different variations at the same time? #2

Closed Coldmooon closed 7 years ago

Coldmooon commented 7 years ago

Thank you for publishing the code. This paper is very interesting. I have one question about the architecture and the Lemma 1 in the paper.

I see that the first step of the network is to transform an training image to various versions and then feed them to the rest of the network. Experiments provided by the paper considered the rotation case and the code corresponding to experiment 4.1.1 uses 24 directions to train the network on the mnist-rot-12k dataset.

My question is that can I transform the training image using more than one transformation at the first step ? Lemma 1 says that the set Φ of all possible transformations forms a group. Does the set Φ mean transformations of different forms (e.g., rotation and scale) or mean a particular transformation with different parameters (e.g., 30° rotation and 60° rotation)?

If, in practice, I set the number of transformation to be 24 in the source code, where 12 of them are used to rotate the input image and the rest 12 are used to scale the input image, the outputs of the 24 paths are TI-pooled to form the transformation-invariant features. Is this case covered by Lemma 1?

dlaptev commented 7 years ago

Hi @Coldmooon, thanks for your interest. This is an important question about TI-pooling, I am glad you asked it.

Yes, different transformations definitely can be used together. The set Φ can contain any transformations, both of different forms and with different parameters.

Scale is a little bit more verbose (because you need resizing, crop and padding), so I will show you an idea using shifts as an example. In Tensorflow implementation you can simply rewrite DataLoader._transform method to look something like this:

from scipy.ndimage.interpolation import shift

list_of_shifts = [[ 0,  0,  0,  0], # no shift
                  [ 0, -1,  0,  0], # up
                  [ 0, +1,  0,  0], # down
                  [ 0,  0, -1,  0], # left
                  [ 0,  0, +1,  0]] # right

def _transform(self, padded, number_of_rotations, list_of_shifts):
  number_of_transformations = number_of_rotations * len(list_of_shifts)
  tiled = np.tile(np.expand_dims(padded, 4), [number_of_transformations])
  transformation_index = 0
  for rotation_index in xrange(number_of_rotations):
    for shift_index in xrange(len(list_of_shifts)):
      # Rotate.
      angle = 360.0 * rotation_index / float(number_of_rotations)
      tiled[:, :, :, :, transformation_index] = rotate(
          tiled[:, :, :, :, transformation_index],
          angle,
          axes=[1, 2],
          reshape=False)
      # Shift.
      tiled[:, :, :, :, transformation_index] = shift(
          tiled[:, :, :, :, transformation_index],
          list_of_shifts[shift_index])
      transformation_index += 1
  return tiled

As for theoretical guarantees: unfortunately shifts (and scale) do not form a group, so we are not covered by Lemma 1 here. Luckily, in practice the network will still learn to generalize from transformed samples, so practically transformation-invariance can be often assumed to hold.

One other thing you can notice is that the set of transformation now increased in size. For some applications it is fine (you can also decrease the number_of_rotations). For others it could increase training/testing time too much. In this case you can consider using subsampling. The code becomes less readable, but the comments should help (also I did not test it):

from scipy.ndimage.interpolation import shift
from random import shuffle

list_of_shifts = [[ 0,  0,  0], # no shift
                  [-1,  0,  0], # up
                  [+1,  0,  0], # down
                  [ 0, -1,  0], # left
                  [ 0, +1,  0]] # right

def _transform(self, padded, number_of_rotations, list_of_shifts,
               number_of_sampled_transformations):
  number_of_transformations = number_of_rotations * len(list_of_shifts)
  tiled = np.tile(np.expand_dims(padded, 4),
                  [number_of_sampled_transformations])
  # A mask showing which transformations to sample.
  transformations_to_sample = map(
      lambda x: x < number_of_sampled_transformations,
      xrange(number_of_transformations))
  # Iterate over images in a batch to sample different transformations.
  for sample_index in xrange(tiled.shape[0]):
    shuffle(transformations_to_sample)
    (transformation_index, sampled_transformation_index) = (0, 0)
    for rotation_index in xrange(number_of_rotations):
      for shift_index in xrange(len(list_of_shifts)):
        # Sample only selected number of transformations for every image.
        if transformations_to_sample[transformation_index]:
          # Rotate.
          angle = 360.0 * rotation_index / float(number_of_rotations)
          tiled[sample_index, :, :, :, sampled_transformation_index] = rotate(
              tiled[sample_index, :, :, :, sampled_transformation_index],
              angle,
              axes=[0, 1]
              reshape=False)
          # Shift.
          tiled[sample_index, :, :, :, sampled_transformation_index] = shift(
              tiled[sample_index, :, :, :, sampled_transformation_index],
              list_of_shifts[shift_index])
          sampled_transformation_index += 1
        transformation_index += 1
  return tiled
Coldmooon commented 7 years ago

@dlaptev Thanks for your elaborate answer. These are very helpful to my better understanding. I found that the provided code can reach better error rate than those reported in sec 4.1.1 in less epochs.

Settings: opt.n_transformations = 24; opt.batch_size = 128; dataset is mnist_rotation_new)

...
epoch:    208, train_error = 0.000000, test_error = 0.012660
epoch:    209, train_error = 0.000000, test_error = 0.012580
epoch:    210, train_error = 0.000000, test_error = 0.012560
epoch:    211, train_error = 0.000000, test_error = 0.012560
...
dlaptev commented 7 years ago

@Coldmooon, this is a known issue, sorry for the inconvenience. The code is correct, but we had the wrong number in the original version of the paper. See this commit (the arxiv paper is also updated).

Coldmooon commented 7 years ago

@dlaptev I see. Thanks for your reply ~~

dlaptev commented 7 years ago

Welcome, and please do not hesitate to ping us if you have any further questions.