Stinky-Tofu / Stronger-yolo

🔥Improve yolo with latest paper
MIT License
3 stars 0 forks source link

v1 版本梯度爆是什么原因啊 #81

Open 123lifei opened 5 years ago

123lifei commented 5 years ago

@Stinky-Tofu running Traceback (most recent call last): File "train.py", line 207, in YoloTrain().train() File "train.py", line 143, in train raise ArithmeticError('The gradient is exploded') ArithmeticError: The gradient is exploded

Stinky-Tofu commented 5 years ago

应该是超参数的原因,可以调一下lr,或者加上warm up @123lifei

123lifei commented 5 years ago

RAIN_INPUT_SIZES = [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] TEST_INPUT_SIZE = 544 STRIDES = [8, 16, 32] IOU_LOSS_THRESH = 0.5

train

BATCH_SIZE = 6 LEARN_RATE_INIT = 1e-5 LEARN_RATE_END = 1e-6 WARMUP_PERIODS = 2 PERIODS_FOR_STEP0 = 20 MAX_PERIODS = 30 ANCHORS = [[(2.0, 6.25), (3.375, 11.25), (5.25, 8.5)], # Anchors for small obj [(2.625, 7.3125), (4.3125, 7.0625), (3.3125, 10.3125)], # Anchors for medium obj [(2.46875, 4.8125), (3.4375, 4.96875), (2.84375, 8.0)]] # Anchors for big obj

ANCHOR_PER_SCALE = 3 MOVING_AVE_DECAY = 0.9995 MAX_BBOX_PER_SCALE = 150

test

MULTI_TEST = False FLIP_TEST = False SCORE_THRESHOLD = 0.01 # The threshold of the probability of the classes IOU_THRESHOLD = 0.45 # The threshold of the IOU when implement NMS

这是参数 关键我训练的第一个迭代就报这个梯度爆炸 真是郁闷!学习率用我的1e-4 降到1e-5 还是梯度导致 关键是第一次迭代就爆炸

123lifei commented 5 years ago

帮帮忙 指导一下 万分感谢

123lifei commented 5 years ago

我训练的是自己的数据集

Stinky-Tofu commented 5 years ago

这个原因很多,数据问题、超参数问题、训练策略问题都有可能,还是自己排查吧

nobody-cheng commented 5 years ago

@123lifei 之前我修改过类别只有1个类进行转换权重, 刚刚我尝试将coco.names还原后进行权重转换,就不会出现开始训练就不会抛出梯度爆炸的异常

123lifei commented 5 years ago

是代码的问题 在train.py中的138行的False改成True 您看看是吗? for batch_image, batch_label_sbbox, batch_label_mbbox, batch_label_lbbox,\ batch_sbboxes, batch_mbboxes, batch_lbboxes \ in self.traindata: , loss_val, global_step_val = self.sess.run( [self.train_op, self.loss, self.global_step], feed_dict={ self.input_data: batch_image, self.label_sbbox: batch_label_sbbox, self.__label_mbbox: batch_label_mbbox, self.label_lbbox: batch_label_lbbox, self.sbboxes: batch_sbboxes, self.__mbboxes: batch_mbboxes, self.lbboxes: batch_lbboxes, self.__training: False } )

123lifei commented 5 years ago

@Stinky-Tofu 是代码的小失误是吗?我改后可以正常训练了 您查看一下 结果和您分享