Open juanmanuelrq opened 4 years ago
maybe you can reduce the learn_rate first, if it doesn't work, try to look for errors in your code and datasets?
@juanmanuelrq Have you solved this problem? I'm training VOC dataset, I got test loss = NAN, but train loss equals to sth. reasonable.
This indicates that you have a problem with train txt file what format are using ? it should be Filepath x1,y1,x2,y2 no headers @llmpass @juanmanuelrq
This indicates that you have a problem with train txt file what format are using ? it should be Filepath x1,y1,x2,y2 no headers @llmpass @juanmanuelrq
Train loss: nan Test loss: nan,This happened to me at the beginning of training,but the format of train.txt is same as you said
@juanmanuelrq Have you resolved the issue?
I have the same problem.
all_model_checkpoint_paths: "Pedestrian_yolov3_loss=6.2686-nan.ckpt-1" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=6.2071-nan.ckpt-2" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=6.1809-nan.ckpt-3" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=6.1537-nan.ckpt-4" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=6.1885-nan.ckpt-5" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=6.1779-nan.ckpt-6" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=nan-nan.ckpt-7" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=nan-nan.ckpt-8" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=nan-nan.ckpt-9" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=nan-nan.ckpt-10" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=nan-nan.ckpt-11" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=nan-nan.ckpt-12" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=nan-nan.ckpt-13" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=nan-nan.ckpt-14" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=nan-nan.ckpt-15" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=nan-nan.ckpt-16" all_model_checkpoint_paths: "Pedestrian_yolov3_loss=nan-nan.ckpt-17"
I am training one class and dataset is about 7000 images.
@qncsn2016 Thank you so much.
@juanmanuelrq Have you solved this problem? I'm training VOC dataset, I got test loss = NAN, but train loss equals to sth. reasonable.
hello,I ran into the same problem and wanted to reinitialize the VOC dataset instead of training on the basis of Coco's pre-training weights.Test_loss =nan was the first epoch when I retrained VOC. How did you solve the problem?Thank you very much
Hi,
Hi, I was training and.... nan..nan,
` => Epoch: 977 Time: 2020-03-03 10:50:58 Train loss: nan Test loss: nan Saving ./checkpoint/yolov3_test_loss=nan.ckpt ... 0it [00:00, ?it/s] => Epoch: 978 Time: 2020-03-03 10:51:11 Train loss: nan Test loss: nan Saving ./checkpoint/yolov3_test_loss=nan.ckpt ... 0it [00:00, ?it/s] => Epoch: 979 Time: 2020-03-03 10:51:30 Train loss: nan Test loss: nan Saving ./checkpoint/yolov3_test_loss=nan.ckpt ... 0it [00:00, ?it/s] => Epoch: 980 Time: 2020-03-03 10:51:48 Train loss: nan Test loss: nan Saving ./checkpoint/yolov3_test_loss=nan.ckpt ... 0it [00:00, ?it/s] => Epoch: 981 Time: 2020-03-03 10:52:03 Train loss: nan Test loss: nan Saving ./checkpoint/yolov3_test_loss=nan.ckpt ...
my config.py file
! /usr/bin/env python
coding=utf-8
================================================================
Copyright (C) 2019 * Ltd. All rights reserved.
#
Editor : VIM
File name : config.py
Author : YunYang1994
Created date: 2019-02-28 13:06:54
Description :
#
================================================================
from easydict import EasyDict as edict
__C = edict()
Consumers can get config by: from config import cfg
cfg = __C
YOLO options
__C.YOLO = edict()
Set the class name
C.YOLO.CLASSES = "./data/classes/class.names" C.YOLO.ANCHORS = "./data/anchors/basline_anchors.txt" C.YOLO.MOVING_AVE_DECAY = 0.9995 C.YOLO.STRIDES = [8, 16, 32] C.YOLO.ANCHOR_PER_SCALE = 3 C.YOLO.IOU_LOSS_THRESH = 0.5 __C.YOLO.UPSAMPLE_METHOD = "resize" C.YOLO.ORIGINAL_WEIGHT = "./checkpoint/yolov3_coco.ckpt" C.YOLO.DEMO_WEIGHT = "./checkpoint/yolov3_coco_demo.ckpt"
Train options
__C.TRAIN = edict()
C.TRAIN.ANNOT_PATH = "./data/dataset/visdrone_train.txt" C.TRAIN.BATCH_SIZE = 6 C.TRAIN.INPUT_SIZE = [320, 352, 384, 416, 448, 480, 512, 544, 576, 608] __C.TRAIN.DATA_AUG = True C.TRAIN.LEARN_RATE_INIT = 1e-4 C.TRAIN.LEARN_RATE_END = 1e-6 C.TRAIN.WARMUP_EPOCHS = 2 C.TRAIN.FISRT_STAGE_EPOCHS = 20 C.TRAIN.SECOND_STAGE_EPOCHS = 20000 __C.TRAIN.INITIAL_WEIGHT = "./checkpoint/yolov3_coco_demo.ckpt"
TEST options
__C.TEST = edict()
C.TEST.ANNOT_PATH = "./data/dataset/visdrone_test.txt" C.TEST.BATCH_SIZE = 2 C.TEST.INPUT_SIZE = 544 __C.TEST.DATA_AUG = False C.TEST.WRITE_IMAGE = True C.TEST.WRITE_IMAGE_PATH = "./data/detection/" C.TEST.WRITE_IMAGE_SHOW_LABEL = True C.TEST.WEIGHT_FILE = "./checkpoint/yolov3_test_loss=9.2099.ckpt-5" __C.TEST.SHOW_LABEL = True C.TEST.SCORE_THRESHOLD = 0.3 __C.TEST.IOU_THRESHOLD = 0.45
`