Ma-Liang-hub commented 1 year ago

Traceback (most recent call last): File "train.py", line 84, in main(opt) File "train.py", line 76, in main trainer.train(callbacks, val) File "/share/disk1/ml/code/efficientteacher-main/trainer/trainer.py", line 532, in train self.train_in_epoch(callbacks) File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 285, in train_in_epoch self.train_without_unlabeled(callbacks) File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 402, in train_without_unlabeled self.update_optimizer(loss, ni) File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 445, in update_optimizer self.ema.update(self.model) File "/share/disk1/ml/code/efficientteacher-main/utils/torch_utils.py", line 331, in update self.updates += 1 TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int' Killing subprocess 60073 Traceback (most recent call last): File "/root/anaconda3/envs/effictteacher/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/root/anaconda3/envs/effictteacher/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 340, in main() File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/root/anaconda3/envs/effictteacher/bin/python', '-u', 'train.py', '--local_rank=0', '--cfg', 'configs/ssod/custom/yolov5l_custom_ssod.yaml']' returned non-zero exit status 1.

BowieHsu commented 1 year ago

老师您好，可以把yaml文件贴给我看看吗

Ma-Liang-hub commented 1 year ago

老师您好，可以把yaml文件贴给我看看吗

EfficientTeacher by Alibaba Cloud

project: 'yolov5_ssod' adam: False epochs: 20 weights: '/share/disk1/ml/code/efficientteacher-main/pretrain_model/efficient-yolov5x.pt' prune_finetune: False linear_lr: True hyp: lr0: 0.01 hsv_h: 0.015 hsv_s: 0.7 hsv_v: 0.4 lrf: 1.0 scale: 0.9 burn_epochs: 10 no_aug_epochs: 0

mixup: 0.1

warmup_epochs: 3

Model: depth_multiple: 1.33 # model depth multiple width_multiple: 1.25 # layer channel multiple Backbone: name: 'YoloV5' activation: 'SiLU' Neck: name: 'YoloV5' in_channels: [256, 512, 1024] out_channels: [256, 512, 1024] activation: 'SiLU' Head: name: 'YoloV5' activation: 'SiLU' anchors: [[10,13, 16,30, 33,23],[30,61, 62,45, 59,119],[116,90, 156,198, 373,326]] # P5/32] Loss: type: 'ComputeLoss' cls: 0.3 obj: 0.7 anchor_t: 4.0

Dataset: data_name: 'coco' train: /share/disk1/ml/code/efficientteacher-main/label_data/coco128/train/ # 118287 images val: /share/disk1/ml/code/yolov5-master/dataset/coco128/test/labels/ # 5000 images test: data/custom_val.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794^ target: /share/disk1/ml/code/efficientteacher-main/unlabel.txt nc: 3 # number of classes np: 0 #number of keypoints names: [ 'plastic', 'other', 'plant'] img_size: 640 batch_size: 8

SSOD: train_domain: True nms_conf_thres: 0.1 nms_iou_thres: 0.65 teacher_loss_weight: 3.0 cls_loss_weight: 0.3 box_loss_weight: 0.05 obj_loss_weight: 0.7 loss_type: 'ComputeStudentMatchLoss' ignore_thres_low: 0.1 ignore_thres_high: 0.6 uncertain_aug: True use_ota: False multi_label: False ignore_obj: False pseudo_label_with_obj: True pseudo_label_with_bbox: True pseudo_label_with_cls: False with_da_loss: False da_loss_weights: 0.01 epoch_adaptor: True resample_high_percent: 0.25 resample_low_percent: 0.99 ema_rate: 0.999 cosine_ema: True imitate_teacher: False

dynamic_thres: True

ssod_hyp: with_gt: False mosaic: 1.0 cutout: 0.5 autoaugment: 0.5 scale: 0.8 degrees: 0.0 shear: 0.0

BowieHsu commented 1 year ago

@Ma-Liang-hub git pull一下呢，我们换了一个safe load接口

Ma-Liang-hub commented 1 year ago

@Ma-Liang-hub git pull一下呢，我们换了一个safe load接口

def init(self, model, decay=0.9999, updates=0):

Create EMA

    self.ema = deepcopy(model.module if is_parallel(model) else model).eval()  # FP32 EMA
    # if next(model.parameters()).device.type != 'cpu':
    #     self.ema.half()  # FP16 EMA
    self.updates = updates  # number of EMA updates
    self.decay = lambda x: decay * (1 - math.exp(-x / 2000))  # decay exponential ramp (to help early epochs)
    for p in self.ema.parameters():
        p.requires_grad_(False)

def update(self, model):
    # Update EMA parameters
    with torch.no_grad():
        self.updates += 1
        d = self.decay(self.updates)

这个错误是在self.updates += 1这一行报的，说self.updates类型为空，但我看上面初始化给他赋的0呀，按理说这个代码应该是没错才对呀，我都没动过这部分

BowieHsu commented 1 year ago

@Ma-Liang-hub 我大概知道原因，您的pt是从标准YOLO转换过来的么，如果是的话，有可能是转换的过程当中，我们没有转出pt中的'updates'这个变量

nanfei666 commented 1 year ago

@Ma-Liang-hub 我也出现了这个问题，pt是从标准YOLO转换过来的，git pull新代码之后OK了

Ma-Liang-hub commented 1 year ago

@Ma-Liang-hub 我大概知道原因，您的pt是从标准YOLO转换过来的么，如果是的话，有可能是转换的过程当中，我们没有转出pt中的'updates'这个变量

解决了，给你点个赞，解决问题真及时！！！

Ma-Liang-hub commented 1 year ago

@Ma-Liang-hub 我也出现了这个问题，pt是从标准YOLO转换过来的，git pull新代码之后OK了嗯嗯，我更新了一下也好了

yjcreation commented 1 year ago

您好，git pull 最新的版本之后，按照从有监督过渡到半监督的方案，出现如下问题：

Traceback (most recent call last):
  File "train.py", line 84, in <module>
    main(opt)
  File "train.py", line 76, in main
    trainer.train(callbacks, val)
  File "/home/cv/xxx/efficientteacher-318/trainer/trainer.py", line 535, in train
    self.train_in_epoch(callbacks)
  File "/home/cv/xxx/efficientteacher-318/trainer/ssod_trainer.py", line 300, in train_in_epoch
    self.train_without_unlabeled(callbacks)
  File "/home/xxx/efficientteacher-318/trainer/ssod_trainer.py", line 443, in train_without_unlabeled
    self.update_optimizer(loss, ni) 
  File "/home/xxx/efficientteacher-318/trainer/ssod_trainer.py", line 485, in update_optimizer
    self.ema.update(self.model)
  File "/home/xxx/efficientteacher-318/utils/torch_utils.py", line 331, in update
    self.updates += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'

配置文件如下：

# EfficientTeacher by Alibaba Cloud 

project: 'runs/train/yolov5_ssod'
adam: False
epochs: 50                                                                            # 总共训练20轮
weights: 'efficient-yolov5s.pt'       # 此处记载的是自己在yolov5上训练需要指定转化后的模型
prune_finetune: False
linear_lr: True
find_unused_parameters: True

hyp:
  lr0: 0.001                                                                           # 调整学习率
  hsv_h: 0.015
  hsv_s: 0.7
  hsv_v: 0.4
  lrf: 1.0
  scale: 0.9
  burn_epochs: 10                                                                     # 控制有监督的训练轮次，半监督的训练次数为epochs-burn_epochs
  no_aug_epochs: 0
  # mixup: 0.1
  warmup_epochs: 3

Model:
  depth_multiple: 0.33 # 1.00  # model depth multiple             # 自己还将此处的深度和宽度设置为s的结构
  width_multiple: 0.50 # 1.00  # layer channel multiple
  Backbone: 
    name: 'YoloV5'
    activation: 'SiLU'
  Neck: 
    name: 'YoloV5' 
    in_channels: [256, 512, 1024]
    out_channels: [256, 512, 1024]
    activation: 'SiLU'
  Head: 
    name: 'YoloV5'
    activation: 'SiLU'
  anchors: [[10,13, 16,30, 33,23],[30,61, 62,45, 59,119],[116,90, 156,198, 373,326]]  # P5/32]
Loss:
  type: 'ComputeLoss'
  cls: 0.3
  obj: 0.7
  anchor_t: 4.0

Dataset:
  data_name: 'coco'
  train: datasets/coco/train2017.txt # data/custom_train.txt  # 118287 images
  val: datasets/coco/val2017.txt # data/custom_val.txt  # 5000 images
  test: datasets/coco/val2017.txt # data/custom_val.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794^
  target: JPEGImages/unlabel.txt
  nc: 7 # 2  # number of classes      # 自己修改了类的个数
  np: 0 #number of keypoints
  names: ['gram', 'pseudogram', 'mon', 'gly', 'gloeo', 'clavi', 'anth']    # 自己修改了类名
  img_size: 640
  batch_size: 16  

SSOD:
  train_domain: True
  nms_conf_thres: 0.1
  nms_iou_thres: 0.3 # 0.65        
  teacher_loss_weight: 1.0
  cls_loss_weight: 0.3
  box_loss_weight: 0.05
  obj_loss_weight: 0.7
  loss_type: 'ComputeStudentMatchLoss'
  ignore_thres_low: 0.1
  ignore_thres_high: 0.6            
  uncertain_aug: True
  use_ota: False
  multi_label: False
  ignore_obj: False
  pseudo_label_with_obj: True
  pseudo_label_with_bbox: True
  pseudo_label_with_cls: False
  with_da_loss: False
  da_loss_weights: 0.01
  epoch_adaptor: True                 # 是否开启epoch_adaptor
  resample_high_percent: 0.25
  resample_low_percent: 0.99
  ema_rate: 0.999
  cosine_ema: True
  imitate_teacher: False              # 是否开启imitate方案
  # dynamic_thres: True
  ssod_hyp:
    with_gt: False
    mosaic: 1.0
    cutout: 0.5
    autoaugment: 0.5
    scale: 0.8
    degrees: 0.0
    shear: 0.0

在配置文件中，weights加载的是自己在yolov5上训练得到的权重，并进行转化。在之前的代码版本中将burn_epochs参数和warmup_epochs参数设置为0后，即可训练，请问一下是必须将burn_epochs参数和warmup_epochs参数设置为0吗？但是在最新的版本中burn_epochs参数和warmup_epochs参数设置为0后依然遇到上面的问题，请问一下如何解决？@BowieHsu

BowieHsu commented 1 year ago

@yjcreation 您好，看起来是转换出来的模型没有update这个参数，您试试用我们最新的转换脚本重新转换一下模型应该能解决这个问题

AlibabaResearch / efficientteacher

unsupported operand type(s) for +=: 'NoneType' and 'int';请问这是什么原因呀 #17

EfficientTeacher by Alibaba Cloud

mixup: 0.1

dynamic_thres: True

Create EMA