PaddlePaddle / PaddleX

All-in-One Development Tool based on PaddlePaddle(飞桨低代码全流程开发工具)
Apache License 2.0
4.79k stars 939 forks source link

用之前的模型作预训练模型,数据集就一个标签,还要保持之前的模型标签,怎么训练 #1633

Open monkeycc opened 1 year ago

monkeycc commented 1 year ago

环境

  1. 请提供您使用的PaddlePaddle和PaddleX的版本号 paddlepaddle-gpu 2.3.2.post116 paddlex 2.1.0

  2. 请提供您使用的操作系统信息,如Linux/Windows/MacOS Windows

  3. 请问您使用的Python版本是? python3.7

  4. 请问您使用的CUDA/cuDNN的版本号是? cuda 11.6 cudnn 8.4


import paddle
from paddle.regularizer import L2Decay
import paddlex as pdx
from paddlex import transforms as T

# 定义训练和验证时的transforms
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/transforms/transforms.md
train_transforms = T.Compose([
    T.RandomDistort(
        brightness_range=0.5,
        brightness_prob=0.0,
        contrast_range=0.5,
        contrast_prob=0.0,
        saturation_range=0.5,
        saturation_prob=0.0,
        hue_range=18.0,
        hue_prob=0.0),
    T.RandomHorizontalFlip(prob=0.5),
    T.Normalize(
        mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
    T.ResizeByShort(
        short_size=800, max_size=1333),
])
eval_transforms = T.Compose([
    T.Normalize(),
    T.ResizeByShort(
        short_size=800, max_size=1333),
])

# 定义训练和验证所用的数据集
# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/datasets.md
train_dataset = pdx.datasets.VOCDetection(
    data_dir=r'D:\AI_workspace\datasets\D0055',
    file_list=r'D:\AI_workspace\datasets\D0055\train_list.txt',
    label_list=r'D:\AI_workspace\datasets\D0055\labels.txt',
    transforms=train_transforms,
    shuffle=True)
eval_dataset = pdx.datasets.VOCDetection(
    data_dir=r'D:\AI_workspace\datasets\D0055',
    file_list=r'D:\AI_workspace\datasets\D0055\val_list.txt',
    label_list=r'D:\AI_workspace\datasets\D0055\labels.txt',
    transforms=eval_transforms)

# 初始化模型,并进行训练
# 可使用VisualDL查看训练指标,参考https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/train/visualdl.md
num_classes = len(train_dataset.labels)
model = pdx.det.FasterRCNN(
    num_classes=num_classes, backbone='ResNet50_vd_ssld')

# 定义优化器:使用PiecewiseDecay和LinearWarmup
learning_rate = 0.0025
lr_decay_epochs = [8, 11]
warmup_steps = 2071
warmup_start_lr = 0.0
train_batch_size = 2
step_each_epoch = train_dataset.num_samples // train_batch_size

boundaries = [b * step_each_epoch for b in lr_decay_epochs]
values = [learning_rate * (0.1**i) for i in range(len(lr_decay_epochs) + 1)]
lr = paddle.optimizer.lr.PiecewiseDecay(
    boundaries=boundaries, values=values)
lr = paddle.optimizer.lr.LinearWarmup(
    learning_rate=lr,
    warmup_steps=warmup_steps,
    start_lr=warmup_start_lr,
    end_lr=learning_rate)
optimizer = paddle.optimizer.Momentum(
    learning_rate=lr,
    momentum=0.9,
    weight_decay=L2Decay(0.0001),
    parameters=model.net.parameters())

# API说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/apis/models/semantic_segmentation.md
# 各参数介绍与调整说明:https://github.com/PaddlePaddle/PaddleX/blob/develop/docs/parameters.md
model.train(
    num_epochs=12,
    train_dataset=train_dataset,
    train_batch_size=train_batch_size,
    eval_dataset=eval_dataset,
    save_interval_epochs=2,
    log_interval_steps=2,
    save_dir=r'D:\AI_workspace\projects\P0048\T0072\output',
    pretrain_weights=r'F:\2022\0622\model.pdparams',
    optimizer=optimizer,
    use_vdl=True,
    resume_checkpoint=None)

报错 Warning: import ppdet from source directory without installing, run 'python setup.py install' to install ppdet firstly 2022-11-22 14:10:30 [INFO] Starting to read file list from dataset... Traceback (most recent call last): File "d:/0SDXX/PaddleX/Xscript.py", line 37, in shuffle=True) File "d:\0SDXX\PaddleX\paddlex\cv\datasets\voc.py", line 225, in init gt_class[i, 0] = cname2cid[cname] KeyError: 'Quejiao_1'


之前模型的总标签 Quejiao_1 Quejiao_2 Quejiao_3 Quejiao_4 Quejiao_5 Wu

现在数据集的标签 Quejiao_2 Quejiao_3

labels.txt改为 Quejiao_1 Quejiao_2 Quejiao_3 Quejiao_4 Quejiao_5 Wu 报错

lailuboy commented 1 year ago

意思是新的数据集标签中已经修改为和原来数据集相同的标签数也会报错?评估集中的有没有改掉

monkeycc commented 1 year ago

新的数据集标签中 已经修改为和原来数据集相同的标签数

训练集和评估集 同一个labels.txt

现在要怎么改

lailuboy commented 1 year ago

cname2cid中的元素就是读取的labels.txt中文件,Quejiao_1看在D:\AI_workspace\datasets\D0055\labels.txt文件中是否存在

monkeycc commented 1 year ago

果然是这个问题 labels.txt 没有这个

是我随手多标注了一个标签 结果忘记了

谢谢大佬的回复 辛苦了

monkeycc commented 1 year ago

然后提示

[ERROR] Invalid pretrain weights. Please specify a '.pdparams' file.

这个模型是我导出来的模型 不能用来训练吗