Closed CanshangD closed 4 years ago
+1. Same issue here
I don't find this problem. Which dataset do you use?
Maybe try to change
grads = tape.gradient(loss_value, model.variables) optimizer.apply_gradients(zip(grads, model.variables), global_step=tf.train.get_or_create_global_step())
to
grads = tape.gradient(loss_value, model.trainable_variables) optimizer.apply_gradients(zip(grads, model.trainable_variables), global_step=tf.train.get_or_create_global_step())
?
I run this code and it works successfully
import os
import tensorflow as tf
import numpy as np
import visualize
tf.enable_eager_execution()
tf.executing_eagerly()
os.environ['CUDA_VISIBLE_DEVICES'] = '1'
from detection.datasets import coco, data_generator
img_mean = (123.675, 116.28, 103.53)
img_std = (1., 1., 1.)
train_dataset = coco.CocoDataSet('./COCO2017/', 'val',
flip_ratio=0.5,
pad_mode='fixed',
mean=img_mean,
std=img_std,
scale=(640, 896))
train_generator = data_generator.DataGenerator(train_dataset)
from detection.models.detectors import faster_rcnn
model = faster_rcnn.FasterRCNN(
num_classes=len(train_dataset.get_categories()))
img, img_meta, bboxes, labels = train_dataset[0]
batch_imgs = tf.Variable(np.expand_dims(img, 0))
batch_metas = tf.Variable(np.expand_dims(img_meta, 0))
_ = model((batch_imgs, batch_metas), training=False)
model.load_weights('weights/faster_rcnn.h5', by_name=True)
batch_size = 1
train_tf_dataset = tf.data.Dataset.from_generator(
train_generator, (tf.float32, tf.float32, tf.float32, tf.int32))
train_tf_dataset = train_tf_dataset.padded_batch(
batch_size, padded_shapes=([None, None, None], [None], [None, None], [None]))
optimizer = tf.train.MomentumOptimizer(1e-3, 0.9, use_nesterov=True)
epochs = 12
for epoch in range(epochs):
iterator = train_tf_dataset.make_one_shot_iterator()
loss_history = []
for (batch, inputs) in enumerate(iterator):
batch_imgs, batch_metas, batch_bboxes, batch_labels = inputs
with tf.GradientTape() as tape:
rpn_class_loss, rpn_bbox_loss, rcnn_class_loss, rcnn_bbox_loss = \
model((batch_imgs, batch_metas, batch_bboxes, batch_labels), training=True)
loss_value = rpn_class_loss + rpn_bbox_loss + rcnn_class_loss + rcnn_bbox_loss
grads = tape.gradient(loss_value, model.variables)
optimizer.apply_gradients(zip(grads, model.variables),
global_step=tf.train.get_or_create_global_step())
loss_history.append(loss_value.numpy())
print('epoch', epoch, '-', np.mean(loss_history))
the log:
epoch 0 - 1.4815336 epoch 1 - 1.1633286 epoch 2 - 1.0060173 epoch 3 - 0.8848684 epoch 4 - 0.78657615 epoch 5 - 0.69864273 epoch 6 - 0.62510866 epoch 7 - 0.5631116 epoch 8 - 0.51153713 epoch 9 - 0.4704786 epoch 10 - 0.4405557 epoch 11 - 0.40453458
这里我没有安装更新tensorflow2.0,所以如果你们使用的是TensorFlow-2.x-Tutorials中改的2.0版本的话,我也不清楚。
如果是tensorflow2.0版本下遇到的问题,可以关闭下这个issue,等疫情结束我回到学校会去更新tf2.0的
@Viredery 感谢回复~ 我试了你的代码,数据集用coco和自己的数据集都还是出现同样的错误,我用的tensorflow版本为2.1,很有可能是tensorflow版本原因:}
@CanshangD 没事,更新到2.0以上版本,模型有些改动的地方,代码中有些接口到了2.x就不支持了,我后来去写MXNet和PyTorch了,导致这个代码就一直没能更新到2.0版本,让你造成困扰了
May you please comment in English, such that your contribution can be useful for everyone? I am trying to use this code too. Thanks! 😊
@loripino21 This code is based on TensorFlow 1.11 and it works fine.
However, if you want to change to TensorFlow2.0 and modify the code like this https://github.com/dragen1860/TensorFlow-2.x-Tutorials/tree/master/16-fasterRCNN, it may arise the problem described in this issue.
And I will upgrade from TensorFlow 1.11 to TensorFlow 2.0 at leisure~
@CanshangD @loripino21 I upgrade my code to support TensorFlow2.0.0 and close this issue. If there are any problems when training, you can open a new issue.
你好,我在训练的时候出现了问题,loss 为nan,调试也不知道问题出在哪里,逻辑看似都没有问题,数据集的应该输入也正确
log:
W1107 09:59:16.315858 140495756453632 optimizer_v2.py:1029] Gradients do not exist for variables ['rcnn_bbox_fc/kernel:0', 'rcnn_bbox_fc/bias:0'] when minimizing the loss. epoch 0 0 nan W1107 09:59:39.785169 140495756453632 optimizer_v2.py:1029] Gradients do not exist for variables ['rcnn_bbox_fc/kernel:0', 'rcnn_bbox_fc/bias:0'] when minimizing the loss. epoch 0 1 nan W1107 10:00:02.858589 140495756453632 optimizer_v2.py:1029] Gradients do not exist for variables ['rcnn_bbox_fc/kernel:0', 'rcnn_bbox_fc/bias:0'] when minimizing the loss. epoch 0 2 nan W1107 10:00:25.615397 140495756453632 optimizer_v2.py:1029] Gradients do not exist for variables ['rcnn_bbox_fc/kernel:0', 'rcnn_bbox_fc/bias:0'] when minimizing the loss. epoch 0 3 nan