@quajak If all lines are `nan` then something went wrong.

ZJJTSL commented 5 years ago

@quajak If all lines are nan then something went wrong.

Try to set batch=64 subdivision=64
Check that you did all as described here: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects
Check you labels using Yolo_mark: https://github.com/AlexeyAB/Yolo_mark

Originally posted by @AlexeyAB in https://github.com/AlexeyAB/darknet/issues/597#issuecomment-381332830

ZJJTSL commented 5 years ago

the loss is not nan but everthing else like class avg Iou and NOobj. i set batch=64,subdivision=64 it dosen't work . one another thing , how many iterations it tanke to make these value not be nan?

AlexeyAB commented 5 years ago

@ZJJTSL

the loss is not nan

This is normal: https://github.com/AlexeyAB/darknet#how-to-train-to-detect-your-custom-objects

Note: If during training you see nan values for avg (loss) field - then training goes wrong, but if nan is in some other lines - then training goes well.

ZJJTSL commented 5 years ago

@AlexeyAB i've tried as the normal solutions but failed i also want to know if it is necessary to iterate 1000times to exmine the value (nan)?

ZJJTSL commented 5 years ago

@AlexeyAB what's more ,in my datasets.there are some imges with no signals ,so the corresponding txt labels is null ,just an empty file with correct name ..will it effect while traing? thx..

AlexeyAB commented 5 years ago

if it is necessary to iterate 1000times to exmine the value (nan)

If there is a nan value in loss, then stop the training - your model and/or dataset is wrong.

If isn't - then train more than 4000 iterations: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual:

txt labels is null ,just an empty file with correct name

This is normal: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty .txt files) - use as many images of negative samples as there are images with objects

ZJJTSL commented 5 years ago

if it is necessary to iterate 1000times to exmine the value (nan)

If there is a nan value in loss, then stop the training - your model and/or dataset is wrong.

If isn't - then train more than 4000 iterations: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual:

txt labels is null ,just an empty file with correct name

This is normal: https://github.com/AlexeyAB/darknet#how-to-improve-object-detection

desirable that your training dataset include images with non-labeled objects that you do not want to detect - negative samples without bounded box (empty .txt files) - use as many images of negative samples as there are images with objects

thanks for your reply . now ,i've tried 1000 iterations ,the avg loss class is not nan ,but the test effect is not as good as expected.. after 1500 iterantions, the avg loss is nan .so ,what's the problem

AlexeyAB commented 5 years ago

@ZJJTSL

thanks for your reply . now ,i've tried 1000 iterations ,the avg loss class is not nan ,but the test effect is not as good as expected..

Train not less than 4000 iterations: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual:

after 1500 iterantions, the avg loss is nan .so ,what's the problem

Your dataset or/and model is wrong.

Use the latest version of Darknet from this repository.
Check your dataset by using: https://github.com/AlexeyAB/Yolo_mark
What parameters do you use in the Makefile?

ZJJTSL commented 5 years ago

or a more precise definitio

@ZJJTSL

thanks for your reply . now ,i've tried 1000 iterations ,the avg loss class is not nan ,but the test effect is not as good as expected..

Train not less than 4000 iterations: https://github.com/AlexeyAB/darknet#when-should-i-stop-training

Usually sufficient 2000 iterations for each class(object), but not less than 4000 iterations in total. But for a more precise definition when you should stop training, use the following manual:

after 1500 iterantions, the avg loss is nan .so ,what's the problem

Your dataset or/and model is wrong.

Use the latest version of Darknet from this repository.

Check your dataset by using: https://github.com/AlexeyAB/Yolo_mark

What parameters do you use in the Makefile?

thanks for a lot . now i turned the tiny model and get 67% map after 12000 iterations (i have no idea why the tiny model fits me ,maybe my dataset is to small? 5classes ,total 2500 images）.another problem is that for a given path, i want to test all the images in the path and output the informations of the boxes into a txt file , i wonder if there are somelike functions i can use .

AlexeyAB commented 5 years ago

@ZJJTSL

https://github.com/AlexeyAB/darknet#how-to-use-on-the-command-line

To process a list of images data/train.txt and save results of detection to result.txt use: ./darknet detector test cfg/coco.data yolov3.cfg yolov3.weights -dont_show -ext_output < data/train.txt > result.txt

To get all filenames of jpg-images from directory to the train.txt file use this command:

on Windows: dir /b /s /a D:\Darknet2\darknet\build\darknet\x64\data\voc\*.jpg > train.txt
on Linux: ls --format single-column /img/voc/*.jpg > result.txt if I remember correctly

And then run ./darknet detector test cfg/coco.data yolov3.cfg yolov3.weights -dont_show -ext_output < train.txt > result.txt

ZJJTSL commented 5 years ago

@ZJJTSL

https://github.com/AlexeyAB/darknet#how-to-use-on-the-command-line

To process a list of images data/train.txt and save results of detection to result.txt use: ./darknet detector test cfg/coco.data yolov3.cfg yolov3.weights -dont_show -ext_output < data/train.txt > result.txt

To get all filenames of jpg-images from directory to the train.txt file use this command:

on Windows: dir /b /s /a D:\Darknet2\darknet\build\darknet\x64\data\voc\*.jpg > train.txt

on Linux: ls --format single-column /img/voc/*.jpg > result.txt if I remember correctly

And then run ./darknet detector test cfg/coco.data yolov3.cfg yolov3.weights -dont_show -ext_output < train.txt > result.txt

thanks for your timely reply . i finished it successfully.

ZJJTSL commented 5 years ago

@ZJJTSL

https://github.com/AlexeyAB/darknet#how-to-use-on-the-command-line

To process a list of images data/train.txt and save results of detection to result.txt use: ./darknet detector test cfg/coco.data yolov3.cfg yolov3.weights -dont_show -ext_output < data/train.txt > result.txt

To get all filenames of jpg-images from directory to the train.txt file use this command:

on Windows: dir /b /s /a D:\Darknet2\darknet\build\darknet\x64\data\voc\*.jpg > train.txt

on Linux: ls --format single-column /img/voc/*.jpg > result.txt if I remember correctly

And then run ./darknet detector test cfg/coco.data yolov3.cfg yolov3.weights -dont_show -ext_output < train.txt > result.txt

emm.another question comes to me. It's required that for each test image , one corresponding txt file (record the box imformations) should be created, rather put all the result into a "result.txt"file. what command should i use?

AlexeyAB commented 5 years ago

@ZJJTSL

Try to use flag --save_labels i.e. use this command: ./darknet detector test cfg/coco.data yolov3.cfg yolov3.weights -dont_show -ext_output -save_labels < train.txt

So there will be created txt file for each jpg file (with the same format as for annotation for training Yolo)

AlexeyAB / darknet

@quajak If all lines are `nan` then something went wrong. #2116