About the inputs? Image-Text pairs.

I have a question about the inputs? the code loss, cmpm_loss, cmpc_loss, i2t_loss, t2i_loss = \ _tower_loss(network_fn, images_splits[k], labels_splits[k], input_seqs_splits[k], input_masks_splits[k]) in my understand, here input the image, image_id-the id of image, seqs I read the code, in training the labels_splits is a list with len 32, that is the id of each image. where is the seqs-id? In the cmpm loss function batch_size = image_embeddings.get_shape().as_list()[0] mylabels = tf.cast(tf.reshape(labels, [batch_size, 1]), tf.float32) labelD = pairwise_distance(mylabels, mylabels) label_mask = tf.cast(tf.less(labelD, 0.5), tf.float32) # 1-match 0-unmatch Are the input image-seq pair is matched? In here.

YingZhangDUT / Cross-Modal-Projection-Learning

About the inputs? Image-Text pairs. #7