Open gsx1378 opened 1 year ago
你好,我基于下载的工程和人体属性识别数据集pa100k进行训练,但是训练过程中的loss会变大或者基本不变,但是评估指标看起来是正常的。命令行如下: export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch \ --gpus="0,1,2,3" \ tools/train.py \ -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch \ --gpus="0,1,2,3" \ tools/train.py \ -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
配置信息PPLCNet_x1_0.yaml如下:
Global: checkpoints: null pretrained_model: /mnt/AlgoTempData1/wangsijun/guoshouxiang/Projects/classification/paddlecla/pretrained_models/person_attribute_pretrained output_dir: "./output/11_PA_PA100k_PPLCNet/" device: "gpu" save_interval: 1 eval_during_train: True eval_interval: 1 epochs: 60 print_batch_step: 10 use_visualdl: False
image_shape: [3, 256, 192] save_inference_dir: "./inference" use_multilabel: True
Arch: name: "PPLCNet_x1_0" pretrained: True use_ssld: True class_num: 26
Loss: Train:
Optimizer: name: Momentum momentum: 0.9 lr: name: Cosine learning_rate: 0.01 warmup_epoch: 5 regularizer: name: 'L2' coeff: 0.0005
DataLoader: Train: dataset: name: MultiLabelDataset image_root: "/mnt/AlgoTempData1/wangsijun/guoshouxiang/Data/classification/05_Person_Attr/pa100k/" cls_label_path: "/mnt/AlgoTempData1/wangsijun/guoshouxiang/Data/classification/05_Person_Attr/pa100k/train_list.txt" label_ratio: True transform_ops:
Infer: infer_imgs: deploy/images/PULC/person_attribute/090004.jpg batch_size: 10 transforms:
Metric: Eval:
loss变化如图所示:
这个最好从多个epoch来观察,单个epoch可能看不出来哈
你好,我基于下载的工程和人体属性识别数据集pa100k进行训练,但是训练过程中的loss会变大或者基本不变,但是评估指标看起来是正常的。命令行如下:
export CUDA_VISIBLE_DEVICES=0,1,2,3 python3 -m paddle.distributed.launch \ --gpus="0,1,2,3" \ tools/train.py \ -c ./ppcls/configs/PULC/person_attribute/PPLCNet_x1_0.yaml
配置信息PPLCNet_x1_0.yaml如下:
global configs
Global: checkpoints: null pretrained_model: /mnt/AlgoTempData1/wangsijun/guoshouxiang/Projects/classification/paddlecla/pretrained_models/person_attribute_pretrained output_dir: "./output/11_PA_PA100k_PPLCNet/" device: "gpu" save_interval: 1 eval_during_train: True eval_interval: 1 epochs: 60 print_batch_step: 10 use_visualdl: False
used for static mode and model export
image_shape: [3, 256, 192] save_inference_dir: "./inference" use_multilabel: True
model architecture
Arch: name: "PPLCNet_x1_0" pretrained: True use_ssld: True class_num: 26
loss function config for traing/eval process
Loss: Train:
Optimizer: name: Momentum momentum: 0.9 lr: name: Cosine learning_rate: 0.01 warmup_epoch: 5 regularizer: name: 'L2' coeff: 0.0005
data loader for train and eval
DataLoader: Train: dataset: name: MultiLabelDataset image_root: "/mnt/AlgoTempData1/wangsijun/guoshouxiang/Data/classification/05_Person_Attr/pa100k/" cls_label_path: "/mnt/AlgoTempData1/wangsijun/guoshouxiang/Data/classification/05_Person_Attr/pa100k/train_list.txt" label_ratio: True transform_ops:
Infer: infer_imgs: deploy/images/PULC/person_attribute/090004.jpg batch_size: 10 transforms:
Metric: Eval:
loss变化如图所示: