akarshzingade / image-similarity-deep-ranking

369 stars 103 forks source link

About SubSample #10

Open zyq001 opened 6 years ago

zyq001 commented 6 years ago

Hey Akarsh, thank you for sharing your great work! I'm new to deep learning, what I'm confused is that you did not implement the SubSample but enlarge the strides. As I have learned, it's not equal, right? Does it have the same or better effect? Is there some special consideration, or some good experience?

longzeyilang commented 6 years ago

@zyq001 do you solve this problem? please let me know

akarshzingade commented 6 years ago

Hey Mike and Longzeyilang! Sub-sampling is same as strides.

"subsample: tuple of length 2. Factor by which to subsample output. Also called strides elsewhere."- https://faroit.github.io/keras-docs/1.2.2/layers/convolutional/#convolution2d

longzeyilang commented 6 years ago

@akarshzingade Thank you for your reply! There is some difference between you and me! According to "Learning Fine-grained Image Similarity with Deep Ranking", the multiscale network structure of Figure 3. your practice may be some problem! def deep_rank_model():

convnet_model = convnet_model_()
first_input = Input(shape=(56,56,3))
first_conv = Conv2D(96, kernel_size=(8, 8),strides=(4,4), padding='same')(first_input)
first_max = MaxPool2D(pool_size=(3,3),strides = (2,2),padding='same')(first_conv)
first_max = Flatten()(first_max)
first_max = Lambda(lambda  x: K.l2_normalize(x,axis=1))(first_max)

second_input = Input(shape=(28,28,3))
second_conv = Conv2D(96, kernel_size=(8, 8),strides=(4,4), padding='same')(second_input)
second_max = MaxPool2D(pool_size=(7,7),strides = (4,4),padding='same')(second_conv)
second_max = Flatten()(second_max)
second_max = Lambda(lambda  x: K.l2_normalize(x,axis=1))(second_max)

merge_one = concatenate([first_max, second_max])
merge_two = concatenate([merge_one, convnet_model.output])
emb = Dense(4096)(merge_two)
emb = Dropout(0.6)(emb)
l2_norm_final = Lambda(lambda  x: K.l2_normalize(x,axis=1))(emb)

final_model = Model(inputs=[first_input, second_input, convnet_model.input], outputs=l2_norm_final)
return final_model
akarshzingade commented 6 years ago

Hey, Longzeyilang. I believe the implementation does follow the architecture shown in Figure 3. Please let me know what the difference is :)

IAmAbdusKhan commented 6 years ago

@akarshzingade @longzeyilang I have tried both the implementations using the corrected triplet loss function i.e the one in the paper and the one by Akarsh. Don't know why but I am getting better results using akarshs implementation on the exactstreet2shop dataset. However , the results are not good and I am looking for more improvement.

IAmAbdusKhan commented 6 years ago

@akarshzingade I think there is a difference between your implementation and the network given in the paper. In your implementation, each of the three networks is fed an image of 224,224,3 and the stride value for the maxpool kernel is greater than the kernel size . In this way some pixel positions are always ignored by the maxpool kernel . Whereas the input image size in the paper is different for each of the networks i,e (224,224,3), (56,56,3) and (28,28,3) and the value of stride is less than the kernel size .

ha121ppy commented 6 years ago

@longzeyilang, I tried your version of 'def deep_rank_model():' but got the following error: Error when checking input: expected input_2 to have shape (56, 56, 3) but got array with shape (224, 224, 3)

Any other place to adjust in the code? thanks a lot

IAmAbdusKhan commented 6 years ago

@ha121ppy you will have to change the next method . This works for me .

def next(self):
    """For python 2.x.
    # Returns
        The next batch.
    """
    with self.lock:
        index_array, current_index, current_batch_size = next(self.index_generator)
    # The transformation of images is not under thread lock
    # so it can be done in parallel
    batch_x = np.zeros((current_batch_size,) + self.image_shape, dtype=K.floatx())
    batch_x_1 = np.zeros((current_batch_size,) + (57,57,3), dtype=K.floatx())
    batch_x_2 = np.zeros((current_batch_size,) + (29,29,3), dtype=K.floatx())

    grayscale = self.color_mode == 'grayscale'

    for i, j in enumerate(index_array):
        fname = self.filenames[j]

        img = load_img(os.path.join(self.directory, fname.split('\r')[0]),
                   grayscale=grayscale,
                   target_size=self.target_size)

        img_1 = img.resize((57,57))
        img_2 = img.resize((29,29))

        x = img_to_array(img, data_format=self.data_format)
        x_1 = img_to_array(img_1, data_format=self.data_format)
        x_2 = img_to_array(img_2, data_format=self.data_format)

        x = self.image_data_generator.random_transform(x)
        x_1 = self.image_data_generator.random_transform(x_1)
        x_2 = self.image_data_generator.random_transform(x_2)

        x = self.image_data_generator.standardize(x)
        x_1 = self.image_data_generator.standardize(x_1)
        x_2 = self.image_data_generator.standardize(x_2)

        batch_x[i] = x
        batch_x_1[i] = x_1
        batch_x_2[i] = x_2

    # optionally save augmented images to disk for debugging purposes
    if self.save_to_dir:
        for i in range(current_batch_size):
            img = array_to_img(batch_x[i], self.data_format, scale=True)
            fname = '{prefix}_{index}_{hash}.{format}'.format(prefix=self.save_prefix,
                                                              index=current_index + i,
                                                              hash=np.random.randint(1e4),
                                                              format=self.save_format)
            img.save(os.path.join(self.save_to_dir, fname))
    # build batch of labels
    if self.class_mode == 'input':
        batch_y = batch_x.copy()
    elif self.class_mode == 'sparse':
        batch_y = self.classes[index_array]
    elif self.class_mode == 'binary':
        batch_y = self.classes[index_array].astype(K.floatx())
    elif self.class_mode == 'categorical':
        batch_y = np.zeros((len(batch_x), self.num_class), dtype=K.floatx())

    else:
        return batch_x

    return [batch_x_1,batch_x_2,batch_x] , batch_y 
longzeyilang commented 6 years ago

I have already tested above code. In fact, I change the tript loss, and use the centre loss function.

ha121ppy commented 6 years ago

@IAmAbdusKhan,@longzeyilang Thanks! @longzeyilang what do you mean ' the centre loss function'? I have adjust loss function as following, do you mean that? def triplt_loss(y_true,y_pred): y_pred = K.clip(y_pred, _EPSILON, 1.0-_EPSILON) loss=tf.convert_to_tensor(0,dtype=tf.float32) total_loss=tf.convert_to_tensor(0,dtype=tf.float32) g=tf.constant(1.0,shape=[1],dtype=tf.float32) zero=tf.constant(0.0,shape=[1],dtype=tf.float32) for i in range(0,batch_size,3): try: q_embedding=y_pred[i] p_embedding=y_pred[i+1] n_embedding=y_pred[i+2] D_q_p=K.sqrt(K.sum((q_embedding-p_embedding)**2)) D_q_n=K.sqrt(K.sum((q_embedding-n_embedding)**2)) loss=tf.maximum(g+D_q_p-D_q_n,zero) total_loss=total_loss+loss except: continue total_loss=total_loss/(batch_size/3) return total_loss

besides, I meet a problem: After data augmentation(I generate about 20 transformation from each image), the model rarely improved. Do you have any idea about the reason? I am trying to adjust net structure but doubt it works. My purpose is to let the transformed image get smallest distance with raw image, but event putting them in training data, it still treat them as different images(large distance)

christophesmet commented 6 years ago

@ha121ppy Have you found anything to improve the accuracy ?