What are the best setting of the param λ and γ?

Ashespt commented 1 year ago

λ was set to 1.0 and γ was reduced progressively during training. More details can be found in the code.

yuleung commented 1 year ago

@Ashespt ，Thanks for your reply! yes, In the code of this Rep, λ was set to 1.0 and γ was reduced progressively during training. However, In the face dataset ‘’MS1M-V3‘’, the value of the loss when start to training is approximately: classfication_loss: 50, ADV_loss: 1, P2S_loss:0.8, In the end of the training, the value of the loss is approximately: classfication_loss: 1.2, ADV_loss: 0.002, P2S_loss:0.45. Final result show that: self-search performance is ok, but the cross-search performance is poor。 Did you have any advise for the params λ and γ setting?
PS: ArcFace is used as the classfication_loss; Threshold is setting to 0.4; Moreover, I have tried testing the setting: 5 times the value of the λ and γ, and the performace of advBCT just like naive BCT(Towards backward-compatible representation learning-2020cvpr)

Ashespt commented 1 year ago

Thanks for your follow. Did you train the face dataset by the reid-trainer in this repo? We implemented the reid part according to the repo TransReID (https://github.com/damo-cv/TransReID/) and didn't test much on this task. I did some works on Reid in other scenarios and found the code of fast-reid can achieve higher performance and more robust than TransReID in different backbones except for transformer-based backbone. So I think it's better to change the training code if you want to achieve compatibility in reid task using resnet-based backbone. By the way, in our paper, we pointed that p2s loss was less affected by outliers. The naive BCT is based on p2p loss and performs worse comparing to other losses in our experiments. In my opinion, adversarial loss is hard to train. if you want to compare p2s loss with p2p loss, perhaps you can try LCE loss first which only has p2s loss. The settings of the params λ and γ are not the key, i think.

yuleung commented 1 year ago

I'm now just migrated advBCT to the face dataset MS1M-V3. I have not use the reid-trainer in your Repo. In fact, I have to rewrite the code of advBCT: The mainly code for train.py:

    old_meta = load_old_meta(cfg)
    model_criterion = torch.nn.NLLLoss()
    for epoch in range(start_epoch, cfg.num_epoch):
        if isinstance(train_loader, DataLoader):
            train_loader.sampler.set_epoch(epoch)
        for batch_idx, (img, local_labels, index) in enumerate(train_loader):
            global_step += 1

            **#get alpha for loss L_adv**
            p = float(batch_idx + epoch * len(train_loader)) / cfg.num_epoch / len(train_loader)
            alpha = 2. / (1. + np.exp(-10 * p)) - 1

            **#get radius_eb for loss L_p2s**
            radius_eb = torch.zeros(cfg.num_classes)
            for j in range(len(local_labels)):
                if str(local_labels[j].item()) in old_meta:
                    radius_max = old_meta[str(local_labels[j].item())]['radius'][-1]
                    radius_eb[local_labels[j].item()] = abs(radius_max - cfg.threshold)
                else:
                    radius_eb[local_labels[j].item()] = 0

            **#get old embeddings**
            index = index.detach().cpu()
            old_local_embeddings = old_feature[index]

            **#New model forward**
            local_embeddings, model_out_new, model_out_old, radius_eb = backbone(img, old_local_embeddings, alpha, radius_eb)

            **#calculate the loss of L_p2s**
            Loss_p2s = 0.
            feat = F.normalize(local_embeddings)
            count = 0
            for j in range(len(feat)):
                if str(local_labels[j].item()) in old_meta:
                    diff = feat[j] - torch.tensor(old_meta[str(local_labels[j].item())]['center'])[None, :].to(
                        feat.device)
                    if 1:
                        if old_meta[str(local_labels[j].item())]['radius'][-1] < cfg.threshold:
                            radius = old_meta[str(local_labels[j].item())]['radius'][-1] + radius_eb[
                                local_labels[j].item()]
                            if batch_idx % cfg.frequent  == 0 and j == 0:
                                print(f'original {old_meta[str(local_labels[j].item())]["radius"][-1]},+ {radius_eb[local_labels[j].item()].item()}, radius {radius.item()}')
                        else:
                            radius = old_meta[str(local_labels[j].item())]['radius'][-1] - radius_eb[
                                local_labels[j].item()]
                            if batch_idx % cfg.frequent  == 0 and j == 0:
                                print(f'original {old_meta[str(local_labels[j].item())]["radius"][-1]},- {radius_eb[local_labels[j].item()].item()}, radius {radius.item()}')
                    if len(old_meta[str(local_labels[j].item())]['radius']) <= 1:
                        continue
                    tmp = max(torch.norm(diff, p=2) - radius, 0)
                    if tmp > 0.:
                        Loss_p2s += tmp
                        count += 1
            if count:
                Loss_p2s /= count

            **#calculate the Loss of L_adv**
            model_label_new = torch.zeros(len(local_labels)).long().cuda()
            model_label_old = torch.ones(len(local_labels)).long().cuda()
            Loss_adv = model_criterion(model_out_new, model_label_new) + model_criterion(model_out_old, model_label_old)
            Loss_adv = Loss_adv * (cfg.num_epoch - epoch + 1) / cfg.num_epoch

            **#calculate the loss of L_cls ---- Arcface_loss**
            Loss_cls = module_partial_fc(local_embeddings, local_labels, opt)

            **#Total loss**
            loss = Loss_cls + Loss_p2s +  Loss_adv

and the network code mainly is:

def forward(self, x, x_o, alpha=0, radius=None):
        with torch.cuda.amp.autocast(self.fp16):
            x = self.conv1(x)
            x = self.bn1(x)
            x = self.prelu(x)
            x = self.layer1(x)
            x = self.layer2(x)
            x = self.layer3(x)
            x = self.layer4(x)
            x = self.bn2(x)
            x = torch.flatten(x, 1)
            x = self.dropout(x)
        x = self.fc(x.float() if self.fp16 else x)
        x = self.features(x)
        xn = F.normalized(x)
        reverse_feature_new = ReverseLayerF.apply(xn, alpha)
        model_out_new = self.discriminator(reverse_feature_new)
        model_out_old = self.discriminator(x_o)
        if radius:
            radius = self.eboundary(radius)
        return x, model_out_new, model_out_old, radius

Is there any mistake here?
Thanks again for your thoughtful reply!!

Ashespt commented 1 year ago

Sorry for replying late. How did you get the center of old features?

yuleung commented 1 year ago

Sorry for replying late. How did you get the center of old features?

I use the python file "extract_feature.py" you provide in this Rep, and the function 'gen_class_meta' is executed as your defaut training code doing.

Ashespt / AdvBCT

What are the best setting of the param λ and γ? #2