facenet use triplet-loss to train net,minimizes the distance between anchor and a positive,maximizes the distance between anchor and a negative of a different identity. while the network is training,the embedding of anchor and positive,negative sample is change too,there is two solutions to generate triplets,the first is generate triplets offline every n steps,using the most recent network checkpoint and computing the argmin and argmax on a subset of the data,second is generate triplet online,this can be done bye seletcting the hard positive/negative exemplars from within a mini-batch.facenet choice second.every minibatch,according to the current embedding,generate triplets,calculate triplet-loss,update embedding,

  1. minibatch begin,facenet select a group from data set.


    def sample_people(dataset, people_per_batch, images_per_person):


    nrof_images = people_per_batch * images_per_person


    nrof_classes = len(dataset)


    class_indices = np.arange(nrof_classes)


    np.random.shuffle(class_indices) i = 0


    image_paths = []


    num_per_class = [] sampled_class_indices = []

    Sample images from these classes until we have enough


    while len(image_paths)<nrof_images:


    class_index = class_indices[i]
    nrof_images_in_class = len(dataset[class_index])
    image_indices = np.arange(nrof_images_in_class)
    nrof_images_from_class = min(nrof_images_in_class, images_per_person, nrof_images-len(image_paths))
    idx = image_indices[0:nrof_images_from_class]
    image_paths_for_class = [dataset[class_index].image_paths[j] for j in idx]
    sampled_class_indices += [class_index]*nrof_images_from_class
    image_paths += image_paths_for_class

    return image_paths, num_per_class

  2. calculate embedding, save to emb_array,according to the array,calculate triplet


    def select_triplets(embeddings, nrof_images_per_class, image_paths, people_per_batch, alpha): """ Select the triplets for training """ trip_idx = 0


    emb_start_idx = 0 num_trips = 0 triplets = []

    VGG Face: Choosing good triplets is crucial and should strike a balance between

    selecting informative (i.e. challenging) examples and swamping training with examples that

    are too hard. This is achieve by extending each pair (a, p) to a triplet (a, p, n) by sampling

    the image n at random, but only between the ones that violate the triplet loss margin. The

    latter is a form of hard-negative mining, but it is not as aggressive (and much cheaper) than

    choosing the maximally violating example, as often done in structured output learning.


    for i in xrange(people_per_batch):


    nrof_images = int(nrof_images_per_class[i])
    for j in xrange(1,nrof_images):
        #第j张图的embedding在emb_arr 中的位置
        a_idx = emb_start_idx + j - 1
        neg_dists_sqr = np.sum(np.square(embeddings[a_idx] - embeddings), 1)
        for pair in xrange(j, nrof_images): # For every possible positive pair.
            p_idx = emb_start_idx + pair
            pos_dist_sqr = np.sum(np.square(embeddings[a_idx]-embeddings[p_idx]))
            neg_dists_sqr[emb_start_idx:emb_start_idx+nrof_images] = np.NaN
            #all_neg = np.where(np.logical_and(neg_dists_sqr-pos_dist_sqr<alpha, pos_dist_sqr<neg_dists_sqr))[0]  # FaceNet selection
            all_neg = np.where(neg_dists_sqr-pos_dist_sqr<alpha)[0] # VGG Face selecction
            nrof_random_negs = all_neg.shape[0]
            if nrof_random_negs>0:
                rnd_idx = np.random.randint(nrof_random_negs)
                n_idx = all_neg[rnd_idx]
                # 选到(a,p,n)作为三元组
                triplets.append((image_paths[a_idx], image_paths[p_idx], image_paths[n_idx]))
                #print('Triplet %d: (%d, %d, %d), pos_dist=%2.6f, neg_dist=%2.6f (%d, %d, %d, %d, %d)' % 
                #    (trip_idx, a_idx, p_idx, n_idx, pos_dist_sqr, neg_dists_sqr[n_idx], nrof_random_negs, rnd_idx, i, j, emb_start_idx))
                trip_idx += 1
            num_trips += 1
    emb_start_idx += nrof_images

    np.random.shuffle(triplets) return triplets, num_trips, len(triplets)

  3. calculate triplet loss,update network,update embedding...
another thing is triplet loss need less GPU memory. when use softmax loss ,8,000,000 people need 32M,if the upper middle layer 1024