Understanding the coherence of the training

tpereztorres commented 5 years ago

Hello @AlexeyAB,

We would like to ask you two doubts about training experience.

First, we have found a problem with the performance during the training. For example, the precision is better at the step 12.000 but it is worse at the step 50.000. However, the best precision is obtained at the step 120.000.

Our dataset contains 65.000 images with more than one bounding box in each image of only one class.

We understand the meaning of overfitting but we do not find the coherence with the obtaining results. Why the training improves and get worse with so much dynamism?

Another doubt is that when we reduce the depth of the network, is it correct that in order to obtain the same or similar best precision (of the step 120.000) we need to train more epochs?

Thank you very much.

AlexeyAB commented 5 years ago

@tpereztorres Hello,

What mAP did you get at 12000, 50000, 120000 iterations?
Did you measure mAP for separate validation dataset? How many images in the validation dataset?
Did you use yolov3.cfg or yolov3-tiny.cfg?
Can you show width, height, batch, subdivisions, learning_rate, scales, steps, random, jitter from your cfg-file?

tpereztorres commented 5 years ago

Hello again @AlexeyAB,

First, thank you for your quick response.

1) We can not specifiy the values of some metrics as mAP because we test the weights in real time for a specific application.

2) We do not have validation dataset.

3) We use yolov3.cfg

4) batch=64 subdivisions=16 width=416 height=416 learning_rate=0.001 steps=400000, 450000 scales=.1, .1 random=1 jitter=0.3

Do you have enough information or need more?

If you have any doubt, please do no hesitate to ask us.

Thank you very much.

AlexeyAB commented 5 years ago

We can not specifiy the values of some metrics as mAP because we test the weights in real time for a specific application.

What accuracy metrics do you use?

For example, the precision is better at the step 12.000 but it is worse at the step 50.000. However, the best precision is obtained at the step 120.000.

How did you measure precision? What dataset do you use for this?

We understand the meaning of overfitting but we do not find the coherence with the obtaining results. Why the training improves and get worse with so much dynamism?

What is the value of fluctuations of accuracy in percent?

Another doubt is that when we reduce the depth of the network, is it correct that in order to obtain the same or similar best precision (of the step 120.000) we need to train more epochs?

Precisely speaking, deeper network with Residual connection has smaller Error for the same iteration number. Page 8, Figure 6 https://arxiv.org/pdf/1512.03385.pdf

In theory, to get higher accuracy - better to use:

deeper network with residual connections
lower size of filters
lower number of filters
in the intermediate layers better to use 32 768 filters (bit-1) intead of 1024 filters (float-32-bit) I.e.
- several float-32/16 layers
- several int-8 layers
- several bit-1 layers
- several int-8 layers
- several float-32/16 layers

tpereztorres commented 5 years ago

Hello again @AlexeyAB,

Thank you for the quick answer.

Just one question, do you mean to use a number of filters between 32 and 768 instead of 1024?

Thank you very much.

AlexeyAB / darknet

Understanding the coherence of the training #2133