Open Ma-Liang-hub opened 1 year ago
老师您好,可以把yaml文件贴给我看看吗
老师您好,可以把yaml文件贴给我看看吗
project: 'yolov5_ssod' adam: False epochs: 20 weights: '/share/disk1/ml/code/efficientteacher-main/pretrain_model/efficient-yolov5x.pt' prune_finetune: False linear_lr: True hyp: lr0: 0.01 hsv_h: 0.015 hsv_s: 0.7 hsv_v: 0.4 lrf: 1.0 scale: 0.9 burn_epochs: 10 no_aug_epochs: 0
warmup_epochs: 3
Model: depth_multiple: 1.33 # model depth multiple width_multiple: 1.25 # layer channel multiple Backbone: name: 'YoloV5' activation: 'SiLU' Neck: name: 'YoloV5' in_channels: [256, 512, 1024] out_channels: [256, 512, 1024] activation: 'SiLU' Head: name: 'YoloV5' activation: 'SiLU' anchors: [[10,13, 16,30, 33,23],[30,61, 62,45, 59,119],[116,90, 156,198, 373,326]] # P5/32] Loss: type: 'ComputeLoss' cls: 0.3 obj: 0.7 anchor_t: 4.0
Dataset: data_name: 'coco' train: /share/disk1/ml/code/efficientteacher-main/label_data/coco128/train/ # 118287 images val: /share/disk1/ml/code/yolov5-master/dataset/coco128/test/labels/ # 5000 images test: data/custom_val.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794^ target: /share/disk1/ml/code/efficientteacher-main/unlabel.txt nc: 3 # number of classes np: 0 #number of keypoints names: [ 'plastic', 'other', 'plant'] img_size: 640 batch_size: 8
SSOD: train_domain: True nms_conf_thres: 0.1 nms_iou_thres: 0.65 teacher_loss_weight: 3.0 cls_loss_weight: 0.3 box_loss_weight: 0.05 obj_loss_weight: 0.7 loss_type: 'ComputeStudentMatchLoss' ignore_thres_low: 0.1 ignore_thres_high: 0.6 uncertain_aug: True use_ota: False multi_label: False ignore_obj: False pseudo_label_with_obj: True pseudo_label_with_bbox: True pseudo_label_with_cls: False with_da_loss: False da_loss_weights: 0.01 epoch_adaptor: True resample_high_percent: 0.25 resample_low_percent: 0.99 ema_rate: 0.999 cosine_ema: True imitate_teacher: False
ssod_hyp: with_gt: False mosaic: 1.0 cutout: 0.5 autoaugment: 0.5 scale: 0.8 degrees: 0.0 shear: 0.0
@Ma-Liang-hub git pull一下呢,我们换了一个safe load接口
@Ma-Liang-hub git pull一下呢,我们换了一个safe load接口
def init(self, model, decay=0.9999, updates=0):
self.ema = deepcopy(model.module if is_parallel(model) else model).eval() # FP32 EMA
# if next(model.parameters()).device.type != 'cpu':
# self.ema.half() # FP16 EMA
self.updates = updates # number of EMA updates
self.decay = lambda x: decay * (1 - math.exp(-x / 2000)) # decay exponential ramp (to help early epochs)
for p in self.ema.parameters():
p.requires_grad_(False)
def update(self, model):
# Update EMA parameters
with torch.no_grad():
self.updates += 1
d = self.decay(self.updates)
这个错误是在self.updates += 1这一行报的,说self.updates类型为空,但我看上面初始化给他赋的0呀,按理说这个代码应该是没错才对呀,我都没动过这部分
@Ma-Liang-hub 我大概知道原因,您的pt是从标准YOLO转换过来的么,如果是的话,有可能是转换的过程当中,我们没有转出pt中的'updates'这个变量
@Ma-Liang-hub 我也出现了这个问题,pt是从标准YOLO转换过来的,git pull新代码之后OK了
@Ma-Liang-hub 我大概知道原因,您的pt是从标准YOLO转换过来的么,如果是的话,有可能是转换的过程当中,我们没有转出pt中的'updates'这个变量
解决了,给你点个赞,解决问题真及时!!!
@Ma-Liang-hub 我也出现了这个问题,pt是从标准YOLO转换过来的,git pull新代码之后OK了 嗯嗯,我更新了一下也好了
您好,git pull 最新的版本之后,按照从有监督过渡到半监督的方案,出现如下问题:
Traceback (most recent call last):
File "train.py", line 84, in <module>
main(opt)
File "train.py", line 76, in main
trainer.train(callbacks, val)
File "/home/cv/xxx/efficientteacher-318/trainer/trainer.py", line 535, in train
self.train_in_epoch(callbacks)
File "/home/cv/xxx/efficientteacher-318/trainer/ssod_trainer.py", line 300, in train_in_epoch
self.train_without_unlabeled(callbacks)
File "/home/xxx/efficientteacher-318/trainer/ssod_trainer.py", line 443, in train_without_unlabeled
self.update_optimizer(loss, ni)
File "/home/xxx/efficientteacher-318/trainer/ssod_trainer.py", line 485, in update_optimizer
self.ema.update(self.model)
File "/home/xxx/efficientteacher-318/utils/torch_utils.py", line 331, in update
self.updates += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'
配置文件如下:
# EfficientTeacher by Alibaba Cloud
project: 'runs/train/yolov5_ssod'
adam: False
epochs: 50 # 总共训练20轮
weights: 'efficient-yolov5s.pt' # 此处记载的是自己在yolov5上训练需要指定转化后的模型
prune_finetune: False
linear_lr: True
find_unused_parameters: True
hyp:
lr0: 0.001 # 调整学习率
hsv_h: 0.015
hsv_s: 0.7
hsv_v: 0.4
lrf: 1.0
scale: 0.9
burn_epochs: 10 # 控制有监督的训练轮次,半监督的训练次数为epochs-burn_epochs
no_aug_epochs: 0
# mixup: 0.1
warmup_epochs: 3
Model:
depth_multiple: 0.33 # 1.00 # model depth multiple # 自己还将此处的深度和宽度设置为s的结构
width_multiple: 0.50 # 1.00 # layer channel multiple
Backbone:
name: 'YoloV5'
activation: 'SiLU'
Neck:
name: 'YoloV5'
in_channels: [256, 512, 1024]
out_channels: [256, 512, 1024]
activation: 'SiLU'
Head:
name: 'YoloV5'
activation: 'SiLU'
anchors: [[10,13, 16,30, 33,23],[30,61, 62,45, 59,119],[116,90, 156,198, 373,326]] # P5/32]
Loss:
type: 'ComputeLoss'
cls: 0.3
obj: 0.7
anchor_t: 4.0
Dataset:
data_name: 'coco'
train: datasets/coco/train2017.txt # data/custom_train.txt # 118287 images
val: datasets/coco/val2017.txt # data/custom_val.txt # 5000 images
test: datasets/coco/val2017.txt # data/custom_val.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794^
target: JPEGImages/unlabel.txt
nc: 7 # 2 # number of classes # 自己修改了类的个数
np: 0 #number of keypoints
names: ['gram', 'pseudogram', 'mon', 'gly', 'gloeo', 'clavi', 'anth'] # 自己修改了类名
img_size: 640
batch_size: 16
SSOD:
train_domain: True
nms_conf_thres: 0.1
nms_iou_thres: 0.3 # 0.65
teacher_loss_weight: 1.0
cls_loss_weight: 0.3
box_loss_weight: 0.05
obj_loss_weight: 0.7
loss_type: 'ComputeStudentMatchLoss'
ignore_thres_low: 0.1
ignore_thres_high: 0.6
uncertain_aug: True
use_ota: False
multi_label: False
ignore_obj: False
pseudo_label_with_obj: True
pseudo_label_with_bbox: True
pseudo_label_with_cls: False
with_da_loss: False
da_loss_weights: 0.01
epoch_adaptor: True # 是否开启epoch_adaptor
resample_high_percent: 0.25
resample_low_percent: 0.99
ema_rate: 0.999
cosine_ema: True
imitate_teacher: False # 是否开启imitate方案
# dynamic_thres: True
ssod_hyp:
with_gt: False
mosaic: 1.0
cutout: 0.5
autoaugment: 0.5
scale: 0.8
degrees: 0.0
shear: 0.0
在配置文件中,weights加载的是自己在yolov5上训练得到的权重,并进行转化。在之前的代码版本中将burn_epochs参数和warmup_epochs参数设置为0后,即可训练,请问一下是必须将burn_epochs参数和warmup_epochs参数设置为0吗? 但是在最新的版本中burn_epochs参数和warmup_epochs参数设置为0后依然遇到上面的问题,请问一下如何解决?@BowieHsu
@yjcreation 您好, 看起来是转换出来的模型没有update这个参数,您试试用我们最新的转换脚本重新转换一下模型应该能解决这个问题
Traceback (most recent call last): File "train.py", line 84, in
main(opt)
File "train.py", line 76, in main
trainer.train(callbacks, val)
File "/share/disk1/ml/code/efficientteacher-main/trainer/trainer.py", line 532, in train
self.train_in_epoch(callbacks)
File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 285, in train_in_epoch
self.train_without_unlabeled(callbacks)
File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 402, in train_without_unlabeled
self.update_optimizer(loss, ni)
File "/share/disk1/ml/code/efficientteacher-main/trainer/ssod_trainer.py", line 445, in update_optimizer
self.ema.update(self.model)
File "/share/disk1/ml/code/efficientteacher-main/utils/torch_utils.py", line 331, in update
self.updates += 1
TypeError: unsupported operand type(s) for +=: 'NoneType' and 'int'
Killing subprocess 60073
Traceback (most recent call last):
File "/root/anaconda3/envs/effictteacher/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "/root/anaconda3/envs/effictteacher/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 340, in
main()
File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 326, in main
sigkill_handler(signal.SIGTERM, None) # not coming back
File "/root/anaconda3/envs/effictteacher/lib/python3.6/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler
raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd)
subprocess.CalledProcessError: Command '['/root/anaconda3/envs/effictteacher/bin/python', '-u', 'train.py', '--local_rank=0', '--cfg', 'configs/ssod/custom/yolov5l_custom_ssod.yaml']' returned non-zero exit status 1.