Fine-tuning pretrained model

csyuhao commented 4 years ago

Thanks @TreB1eN for the great work! I was trying to fine-tune on a small dataset by the pretraind model IR-SE50. In this process, I do not train the embedding model, but add the nn.Linear layer to do softmax classification.

There are the related code. The input of nn.Linear is embeddings before normed.

class Arcface(Module):
    # just fc, use ArcFace embedding model
    def __init__(self, embedding_size=512, classnum=28, pretrained=None):
        super(Arcface, self).__init__()
        self.backbone = Backbone(50, 0.6, 'ir_se')
        self.fc = Linear(embedding_size, classnum)
        if pretrained:
            self.backbone.load_state_dict(torch.load(pretrained))

    def forward(self, input):
        embeddings, linear = self.backbone(input)
        logits = self.fc(linear)
        return embeddings, logits

The forward part of Backbone is also changed as follows:

class Backbone(Module):
    def __init__(self, num_layers, drop_ratio, mode='ir'):
        super(Backbone, self).__init__()
        assert num_layers in [
            50, 100, 152], 'num_layers should be 50,100, or 152'
        assert mode in ['ir', 'ir_se'], 'mode should be ir or ir_se'
        blocks = get_blocks(num_layers)
        if mode == 'ir':
            unit_module = bottleneck_IR
        elif mode == 'ir_se':
            unit_module = bottleneck_IR_SE
        self.input_layer = Sequential(Conv2d(3, 64, (3, 3), 1, 1, bias=False),
                                      BatchNorm2d(64),
                                      PReLU(64))
        self.output_layer = Sequential(BatchNorm2d(512),
                                       Dropout(drop_ratio),
                                       Flatten(),
                                       Linear(512 * 7 * 7, 512),
                                       BatchNorm1d(512))
        modules = []
        for block in blocks:
            for bottleneck in block:
                modules.append(
                    unit_module(bottleneck.in_channel,
                                bottleneck.depth,
                                bottleneck.stride))
        self.body = Sequential(*modules)

    def forward(self, x):
        x = self.input_layer(x)
        x = self.body(x)
        x = self.output_layer(x)
        return l2_norm(x), x

However, in the training process, the loss is downing very slow. The lr is 1e-3, and the optimizer is Adam.

Epoch 1/20
----------
The train loss is 3.353309392929077, The accuracy is 0.0133928582072258

Epoch 2/20
----------
The train loss is 3.2458348274230957, The accuracy is 0.2031250149011612

Epoch 3/20
----------
The train loss is 3.152858257293701, The accuracy is 0.3125

Epoch 4/20
----------
The train loss is 3.0687711238861084, The accuracy is 0.3504464328289032

Epoch 5/20
----------
The train loss is 2.9923739433288574, The accuracy is 0.3638392984867096

Could you give me some advice, or point out where my mistakes are?

hphuongdhsp commented 4 years ago

Using directly the embedding of the pretrained model may be better. The model lookalikes as the following:

`class Arcface(nn.Module):

def __init__(self, embedding_size=512, classnum=28):
    super(Arcface, self).__init__()
    self.embedding_size = embedding_size
    self.relu = nn.ReLU(inplace=True)
    self.fc_1 = nn.Linear(embedding_size, 2048)
    self.fc = nn.Linear(2048, classnum)

def forward(self, input):
    global_feat = self.fc_1(input)
    global_feat = self.relu(global_feat)
    global_feat = F.dropout(global_feat, p=0.2)
    logits = self.fc(global_feat)
    return  logits`

youthM commented 3 years ago

@yuhaoooo Hi，如果想要微调模型，需要怎么弄啊？不太知道怎么弄，想请教下你。非常感谢~

csyuhao commented 3 years ago

一般来说，人脸识别模型是由特征提取器和分类头组成。特征提取器就是在大的训练数据集上，利用ArcFace、CosFace等特殊的损失函数训练的模型。而分类头是由多层感知机、SVM、或者聚类算法实现的。在微调的时候，我们一般不会重新训练特征提取器。除非你有一个更大的本地数据集，否则我们只对分类头进行训练。

youthM commented 3 years ago

@yuhaoooo 对，是这样，但是我对pytorch不熟悉，不知道代码需要改哪些地方怎么改，方便提供下你的这部分代码吗？谢谢~

csyuhao commented 3 years ago

你是说这个repo里面的代码？这个repo我没细看，我推荐你看这个Face_Pytorch。在train_softmax.py把#L109改成

    optimizer_classi = optim.SGD([
        {'params': margin.parameters(), 'weight_decay': 5e-4}
    ], lr=0.1, momentum=0.9, nesterov=True)

再把下边的net.train()改成net.eval()就行了。

youthM commented 3 years ago

@yuhaoooo 非常感谢！！！！

ANDRESHZ commented 11 months ago

Hi @yuhaoooo , I hope you are having nice results, I am wondering if you finally found a way to tune with your dataset.

Can you share with the community your code?

XavierYu404 commented 5 months ago

一般来说，人脸识别模型是由特征提取器和分类头组成。特征提取器就是在大的训练数据集上，利用ArcFace、CosFace等特殊的损失函数训练的模型。而分类头是由多层感知机、SVM、或者聚类算法实现的。在微调的时候，我们一般不会重新训练特征提取器。除非你有一个更大的本地数据集，否则我们只对分类头进行训练。

那训练完成后只能对训练集中的人利用分类来进行识别嘛？数据集之外的人是不是就没法识别了

TreB1eN / InsightFace_Pytorch

Fine-tuning pretrained model #145