关于ppyolo训练很慢很慢的情况

shuxsu commented 3 years ago

max_iters=13000 batch size=12 work num=1 2 4 8 16都试过时间没有太大区别 bf_size=1 2都试过区别不大

13000次迭代需要20个小时版本1.84 Detection版本2.01 用的aistudio的v100跑的项目图像数量训练集260张图片尺寸在1440x1440之间

shuxsu commented 3 years ago

为什么两个同样配置全部都一样的数据集一个10小时训练完 2000张训练图片另一个2天训练完 300张训练图片什么原因导致小数据集反而训练的更慢呢迭代次数、bs 等全部一致

liuhuiCNN commented 3 years ago

您好，麻烦您确认下两次都是使用GPU训练的吗？按照您的描述，听起来训练速递差距很大，很不合理。

shuxsu commented 3 years ago

都是aistudio的GPU训练配置都一样只是数据集不一样 ------------------ 原始邮件 ------------------ @.> 发送时间: 2021年4月17日(星期六) 晚上11:29 @.>; @.**@.>; 主题: Re: [PaddlePaddle/PaddleDetection] 关于ppyolo训练很慢很慢的情况 (#2680)

shuxsu commented 3 years ago

您好，麻烦您确认下两次都是使用GPU训练的吗？按照您的描述，听起来训练速递差距很大，很不合理。

您能给我解答下为什么这么慢的原因吗？

shuxsu commented 3 years ago

architecture: YOLOv3
use_gpu: true
max_iters: 80000
log_iter: 20
save_dir: output
snapshot_iter: 4000
metric: COCO
pretrain_weights: https://paddlemodels.bj.bcebos.com/object_detection/ppyolo.pdparams
weights: output/ppyolo/model_final
num_classes: 2
use_fine_grained_loss: true
use_ema: true
ema_decay: 0.9998

YOLOv3:
  backbone: ResNet
  yolo_head: YOLOv3Head
  use_fine_grained_loss: true

ResNet:
  norm_type: sync_bn
  freeze_at: 0
  freeze_norm: false
  norm_decay: 0.
  depth: 50
  feature_maps: [3, 4, 5]
  variant: d
  dcn_v2_stages: [5]

YOLOv3Head:
  anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
  anchors: [[10, 13], [16, 30], [33, 23],
            [30, 61], [62, 45], [59, 119],
            [116, 90], [156, 198], [373, 326]]
  norm_decay: 0.
  coord_conv: true
  iou_aware: true
  iou_aware_factor: 0.4
  scale_x_y: 1.05
  spp: true
  yolo_loss: YOLOv3Loss
  nms: MatrixNMS
  drop_block: true

YOLOv3Loss:
  ignore_thresh: 0.7
  scale_x_y: 1.05
  label_smooth: false
  use_fine_grained_loss: true
  iou_loss: DiouLossYolo
  iou_aware_loss: IouAwareLoss

DiouLossYolo:
  loss_weight: 1.0
  max_height: 608
  max_width: 608

IouAwareLoss:
  loss_weight: 1.0
  max_height: 608
  max_width: 608

MatrixNMS:
    background_label: -1
    keep_top_k: 100
    normalized: false
    score_threshold: 0.01
    post_threshold: 0.01

LearningRate:
  base_lr: 0.000125
  schedulers:
  - !PiecewiseDecay
    gamma: 0.1
    milestones:
    - 60000
    - 70000
  - !LinearWarmup
    start_factor: 0.
    steps: 1000

OptimizerBuilder:
  optimizer:
    momentum: 0.9
    type: Momentum
  regularizer:
    factor: 0.0005
    type: L2

_READER_: 'ppyolo_g_reader.yml'

shuxsu commented 3 years ago

TrainReader:
  inputs_def:
    fields: ['image', 'gt_bbox', 'gt_class', 'gt_score']
    num_max_boxes: 50
  dataset:
    !COCODataSet
      image_dir: train
      anno_path: annotations/instance_train.json
      dataset_dir: /home/aistudio/work/green
      with_background: false
  sample_transforms:
    - !DecodeImage
      to_rgb: True
      with_mixup: True
    - !MixupImage
      alpha: 1.5
      beta: 1.5
    - !RandomFlipImage
      is_normalized: false
    - !NormalizeBox {}
    - !PadBox
      num_max_boxes: 50
    - !BboxXYXY2XYWH {}
  batch_transforms:
  - !RandomShape
    sizes: [320, 352, 384, 416, 448, 480, 512, 544, 576, 608]
    random_inter: True
  - !NormalizeImage
    mean: [0.485, 0.456, 0.406]
    std: [0.229, 0.224, 0.225]
    is_scale: True
    is_channel_first: false
  - !Permute
    to_bgr: false
    channel_first: True
  # Gt2YoloTarget is only used when use_fine_grained_loss set as true,
  # this operator will be deleted automatically if use_fine_grained_loss
  # is set as false
  - !Gt2YoloTarget
    anchor_masks: [[6, 7, 8], [3, 4, 5], [0, 1, 2]]
    anchors: [[10, 13], [16, 30], [33, 23],
              [30, 61], [62, 45], [59, 119],
              [116, 90], [156, 198], [373, 326]]
    downsample_ratios: [32, 16, 8]
  batch_size: 4
  shuffle: true
  mixup_epoch: 150
  drop_last: true
  worker_num: 1
  bufsize: 1
  use_process: true

EvalReader:
  inputs_def:
    fields: ['image', 'im_size', 'im_id']
    num_max_boxes: 50
  dataset:
    !COCODataSet
      image_dir: val
      anno_path: annotations/instance_val.json
      dataset_dir: /home/aistudio/work/green
      with_background: false
  sample_transforms:
    - !DecodeImage
      to_rgb: True
    - !ResizeImage
      target_size: 608
      interp: 2
    - !NormalizeImage
      mean: [0.485, 0.456, 0.406]
      std: [0.229, 0.224, 0.225]
      is_scale: True
      is_channel_first: false
    - !PadBox
      num_max_boxes: 50
    - !Permute
      to_bgr: false
      channel_first: True
  batch_size: 2
  drop_empty: false
  worker_num: 1
  bufsize: 1

TestReader:
  inputs_def:
    image_shape: [3, 608, 608]
    fields: ['image', 'im_size', 'im_id']
  dataset:
    !ImageFolder
      anno_path: /home/aistudio/work/green/annotations/instance_test.json
      with_background: false
  sample_transforms:
    - !DecodeImage
      to_rgb: True
    - !ResizeImage
      target_size: 608
      interp: 2
    - !NormalizeImage
      mean: [0.485, 0.456, 0.406]
      std: [0.229, 0.224, 0.225]
      is_scale: True
      is_channel_first: false
    - !Permute
      to_bgr: false
      channel_first: True
  batch_size: 1

shuxsu commented 3 years ago

batch size=1能快点但是效果很差，batch size=4

如何能加速啊

yghstill commented 3 years ago

@shuxsu 看你的描述全部都一样的数据集，一个慢一个快，图像大小前后有变化吗，图片size越大，数据预处理越耗时？评估训练快慢，看ips的指标，ips越高，训练速度越快。看你的配置文件，将TrainReader中worker_num设置成8或16、bufsize设成8或16看是否有提速。

shuxsu commented 3 years ago

ips 6.03316 images/sec 问题是我训练cityscape那个尺寸大 1024x2048 但是时间还好我训练自定义数据集那个尺寸都没cityscape的大但是反而时间更长总batch size不是等于bf size x batch size吗我bf size大总batch size更大那不是更耗时吗。。。

You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.ips

yghstill commented 3 years ago

bufsize只是数据预处理的缓存大小，和batch size相关，可以等于总batch size。总的bacth size=batch_size * gpu卡数。可以top -c看下运行时CPU利用率，ppyolo训练因为数据预处理多，本身比较耗时。

qingqing01 commented 3 years ago

@shuxsu 两个迭代轮数一样嘛？默认的PP-YOLO在COCO上训练迭代轮数非常多，自己数据可以依据epoch数调少一些。另外，如果数据可以公开的话，也可以发出来，我们测试看下。

shuxsu commented 3 years ago

所有配置信息一模一样除了数据集不一样自定义数据集和cityscape数据集的链接如下 cityscape：https://aistudio.baidu.com/aistudio/datasetdetail/79240 green：https://aistudio.baidu.com/aistudio/datasetdetail/80823

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.

heavengate commented 3 years ago

看你TrainReader.worker_num和TrainReader.bufsize都配置的1，YOLO系列模型预处理比较复杂，强依赖预处理多子进程并发加速，可以试下把worker_num和bufsize开大一些，worker_num可以根据你的机器的CPU核数，不超过核数可以尽量开发，bufsuze可以设置为训练卡数或者卡数的两倍另外，如果是小数据集，建议不开始image mixup，可以把TrainReader.mixup_epoch设置为0

617475jordan commented 3 years ago

我也发现了，用ppyolo_r50vd_dc_voc,yml训练只要三四个小时，而ppyolov2_r50vd_dcn_voc.yml需要一天多，不知道什么原因。机器配置是i9-10900k，rtx3090，128g内存

ShawnXsw commented 3 years ago

v2.3分支，ppyolo-tiny 也遇到了，预处理效率低，gpu长时间无占用。硬盘有阵列卡，不太应该时IO卡顿，CPU金牌且cpu占用也很低，worker_num: 8。偶尔还有cpu、gpu占用均长时间为0%情况

paddle-bot-old[bot] commented 2 years ago

Since this issue has not been updated for more than three months, it will be closed, if it is not solved or there is a follow-up one, please reopen it at any time and we will continue to follow up. It is recommended to pull and try the latest code first. 由于该问题超过三个月未更新，将会被关闭，若问题未解决或有后续问题，请随时重新打开（建议先拉取最新代码进行尝试），我们会继续跟进。

zhoushuang66 commented 1 year ago

我使用release/2.0-rc分支的ppyolo, configs/ppyolo/ppyolo_reader.yml中当设置batch_size=4，预计训练时间是40h，默认batch_size=24，预计训练时间11x24h，为什么训练时间差距那么大？

bnbncch commented 4 months ago

我的版本一开始使用的是PP-YOLO的release2.5.2，paddle2.2.2-gpu。cuda11.3，ununtu20.04。然后出现了训练速度很慢的情况，GPU后面在刷别的issue的时候，有个大哥提醒了，是cuda版本的问题，后面同样的配置用cuda10.2，ubuntu18.04试了发现速度果然就变快了很多。

PaddlePaddle / PaddleDetection

关于ppyolo训练很慢很慢的情况 #2680