Viredery / tf-eager-fasterrcnn

Faster R-CNN R-101-FPN model was implemented with TensorFlow2.0 eager execution.
MIT License
89 stars 46 forks source link

Training losses all nan values #14

Open LMD93 opened 4 years ago

LMD93 commented 4 years ago

Hello, I tried running the jupyter notebook script as it is for training of the model. The only change I made was to scale under train_dataset.

train_dataset = coco.CocoDataSet(
    "./COCO2017/",
    "val",
    flip_ratio=0.5,
    pad_mode="fixed",
    mean=img_mean,
    std=img_std,
    scale=(256, 512),
)

I printed out the individual losses and this is what I see.

rpn_class_loss  tf.Tensor(nan, shape=(), dtype=float32)
rpn_bbox_loss  tf.Tensor(nan, shape=(), dtype=float32)
rcnn_class_loss  tf.Tensor(0.0, shape=(), dtype=float32)
rcnn_bbox_loss  tf.Tensor(nan, shape=(), dtype=float32)

There is no error thrown, and I did not make any changes to any of the scripts. Any idea why this is happening? Thanks!

Viredery commented 4 years ago

I am not sure what the problem is. I tried it and it ran successfully. Could you please send your program logs or screenshots?