Training issues on Large dataset + Last layer output from the source

nbkhuong commented 5 years ago

Hello, I have several questions on the training and the last layer output:

I am training yolov3_spp to detect small object with one class but after training for 80000 iterations, the avg loss does not decrease (fluctuating around 3.x to 4.x) while the mAP does not increase significantly (or increase very slightly, it started fluctuating around 57% or 58%), especially it can be spotted out after 15000th iteration. My train data contains ~50000 images and my validate data has ~12300 images. With this configuration spp, the detection I got after training on some test images is not really satisfying my expectation. It does detect well for some cases, but on some others cases I does not.

The data clustering distribution you can see on this attached image. Can you suggest some ways to solve this since the training takes really long (some days) to get to the point where I can recognize if it's working out or not?

Data Clustering: ask_cluster Train & Validate plots: ask_plot

With this dataset, since I am only train the model to detect only 1 class, what would the changes to take from the cfg to do such a task since yolov3_spp model is kinda slow because I have to embed this yolo piece into another program and I do need some speed-up on the run?
I want to write the output of the last layer to screen, can you tell me where in the code I should look at?
Do we have some existing implementations on tensorflow that work with this spp and tiny_3l?

Thank you very much!

AlexeyAB commented 5 years ago

@nbkhuong Hi,

Can you rename cfg file to txt and attach it to your message?
In this case may be better to train yolov3-tiny_3l.cfg with higher resolution.
There are 3 output [yolo]-layers in the yolov3-spp.cfg. If you want to output source float values for each of 3 [yolo] layers to the console during detection, then between these two lines: https://github.com/AlexeyAB/darknet/blob/6231b748c44e2007b5c3cbf765a50b122782c5a2/src/yolo_layer.c#L457-L458 You can add code - someting like:
```
cudaDeviceSynchronize();
printf("\n Output: \n");
int i;
for(i = 0; i < (l.batch*l.outputs); ++i) {
printf("%f, ", l.output[i]);
}
```
Look at https://github.com/AlexeyAB/darknet#yolo-v3-in-other-frameworks but I don't know does it support spp or tiny_3l models.

nbkhuong commented 5 years ago

Thank you for your extremely quick response!

Yes sure. yolov3_spp.txt
This I would need some time to train since I have not tried tiny_3l from the beginning.
Thank you very much. It is like, I just need the very last output (from the last yolo layer I believe, please correct me if I am wrong).
[to be tested]

AlexeyAB commented 5 years ago

In you case may be better to use yolov3-tiny_3l.cfg with width=1248 height=1248 or width=1664 height=1664 and the same anchors

Also may be better to train with batch=64 subdivisions=64 instead of batch=16 subdivisions=16

When mAP doesn't increase, try to reduce learning_rate 10x times and continue training.

the very last output (the 3rd [yolo] layer) is used for the smallest obejcts (~4x6, 9x16, 16x29 after resizing to 832x832) So you can try to use 113 [yolo]-layer index for yolov3-spp.cfg to output only the last [yolo] layer:
```
if(state.index == 113) {
cudaDeviceSynchronize();
printf("\n Output: \n");
int i;
for(i = 0; i < (l.batch*l.outputs); ++i) {
    printf("%f, ", l.output[i]);
}
}
```

nbkhuong commented 5 years ago

Hi AlexeyAB,

thank you very much for your help and for your very quick response. In 1., the term "same anchors" you mean the same anchors as I calculated for the last trained yolov3_spp model or should I recalculate it again?

AlexeyAB commented 5 years ago

@nbkhuong You should recalculate anchors for new width= & height= from your cfg-file

nbkhuong commented 5 years ago

Hi,

I have trained the new model with tiny_3l and the result does get a little bit better, but not as expected. mAP does go up to 60.x % but not really increasing any further.
I also tried the TensorFlow implementation from https://github.com/mystic123/tensorflow-yolo-v3 and it cannot run the yolov3 I trained from the yolov3.cfg with the settings for small objects detection, as followed:

for training for small objects (smaller than 16x16 after the image is resized to 416x416) - set layers = -1, 11 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L720 and set stride=4 instead of https://github.com/AlexeyAB/darknet/blob/6390a5a2ab61a0bdf6f1a9a6b4a739c16b36e0d7/cfg/yolov3.cfg#L717

However, the implementation doesn't work with this modified configuration. I tried the original weights yolov3.weights from darknet website and it worked out of the box. So I guess I would have to modify something from the TensorFlow implementation due to size mismatch since the settings for small objects detection lead to some changes in the output after the layers, accordingly I have some questions on this:

Can you explain the parameter "layers = [A] [B]" in the [route] layer? Is it the indicator for the network to go back to the "x"th layer, where x = current layer - 1? And what does [B] mean in this case?
Is it possible to write the output of each layer of the whole network to the screen?

Thank you very much!

AlexeyAB commented 5 years ago

Can you explain the parameter "layers = [A] [B]" in the [route] layer? Is it the indicator for the network to go back to the "x"th layer, where x = current layer - 1? And what does [B] mean in this case?

This is Concat layer (as in Caffe/TensorFlow/...): https://www.tensorflow.org/api_docs/python/tf/concat

Is it possible to write the output of each layer of the whole network to the screen?

It will be millions of numbers.

You can try to un-comment and change these code: https://github.com/AlexeyAB/darknet/blob/b6a824df39d1a79f15916d2c11133ce27bc0ab06/src/network_kernels.cu#L67-L84

AlexeyAB / darknet

Training issues on Large dataset + Last layer output from the source #2840