PaddleClas版本以及PaddlePaddle版本：paddleclas==2.5.1 && paddlepaddle-gpu==2.4.0.post117
涉及的其他产品使用的版本号：未涉及
训练环境信息： a. 具体操作系统: Linux b. Python版本号: Python3.8.10 c. CUDA/cuDNN版本: CUDA11.7/cuDNN 8.5.0
完整的代码: 代码未改动

你好，我基于下载的工程和人体属性识别数据集pa100k进行训练，但是训练过程中的loss会变大或者基本不变，但是评估指标看起来是正常的。命令行如下： export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch \ --gpus="0,1,2,3" \ tools/train.py \ -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml

配置信息PPLCNet_x1_0.yaml如下：

global configs

Global: checkpoints: null pretrained_model: /mnt/AlgoTempData1/wangsijun/guoshouxiang/Projects/classification/paddlecla/pretrained_models/person_attribute_pretrained output_dir: "./output/11_PA_PA100k_PPLCNet/" device: "gpu" save_interval: 1 eval_during_train: True eval_interval: 1 epochs: 60 print_batch_step: 10 use_visualdl: False

used for static mode and model export

image_shape: [3, 256, 192] save_inference_dir: "./inference" use_multilabel: True

model architecture

Arch: name: "PPLCNet_x1_0" pretrained: True use_ssld: True class_num: 26

loss function config for traing/eval process

Loss: Train:

MultiLabelLoss: weight: 1.0 weight_ratio: True size_sum: True Eval:
MultiLabelLoss: weight: 1.0 weight_ratio: True size_sum: True

Optimizer: name: Momentum momentum: 0.9 lr: name: Cosine learning_rate: 0.01 warmup_epoch: 5 regularizer: name: 'L2' coeff: 0.0005

data loader for train and eval

DataLoader: Train: dataset: name: MultiLabelDataset image_root: "/mnt/AlgoTempData1/wangsijun/guoshouxiang/Data/classification/05_Person_Attr/pa100k/" cls_label_path: "/mnt/AlgoTempData1/wangsijun/guoshouxiang/Data/classification/05_Person_Attr/pa100k/train_list.txt" label_ratio: True transform_ops:

DecodeImage: to_rgb: True channel_first: False
ResizeImage: size: [192, 256]
TimmAutoAugment: prob: 0.8 config_str: rand-m9-mstd0.5-inc1 interpolation: bicubic img_size: [192, 256]
Padv2: size: [212, 276] pad_mode: 1 fill_value: 0
RandomCropImage: size: [192, 256]
RandFlipImage: flip_code: 1
NormalizeImage: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: ''
RandomErasing: EPSILON: 0.4 sl: 0.02 sh: 1.0/3.0 r1: 0.3 attempt: 10 use_log_aspect: True mode: pixel sampler: name: DistributedBatchSampler batch_size: 64 drop_last: True shuffle: True loader: num_workers: 4 use_shared_memory: True Eval: dataset: name: MultiLabelDataset image_root: "/mnt/AlgoTempData1/wangsijun/guoshouxiang/Data/classification/05_Person_Attr/pa100k/" cls_label_path: "/mnt/AlgoTempData1/wangsijun/guoshouxiang/Data/classification/05_Person_Attr/pa100k/val_list.txt" label_ratio: True transform_ops:
DecodeImage: to_rgb: True channel_first: False
ResizeImage: size: [192, 256]
NormalizeImage: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: '' sampler: name: DistributedBatchSampler batch_size: 64 drop_last: False shuffle: False loader: num_workers: 4 use_shared_memory: True

Infer: infer_imgs: deploy/images/PULC/person_attribute/090004.jpg batch_size: 10 transforms:

DecodeImage: to_rgb: True channel_first: False
ResizeImage: size: [192, 256]
NormalizeImage: scale: 1.0/255.0 mean: [0.485, 0.456, 0.406] std: [0.229, 0.224, 0.225] order: ''
ToCHWImage: PostProcess: name: PersonAttribute threshold: 0.5 #default threshold glasses_threshold: 0.3 #threshold only for glasses hold_threshold: 0.6 #threshold only for hold

Metric: Eval:

ATTRMetric:

loss变化如图所示：

PaddlePaddle / PaddleClas

人体属性识别模型PPLCNet训练loss不下降 #2958

global configs

used for static mode and model export

model architecture

loss function config for traing/eval process

data loader for train and eval