Custom dataset - Githubissues

rmmal commented 7 years ago

Hello @AlexeyAB

I use yolo in a Text detection problem . My dataset has 13k images with different sizes from 200x150 up to 4000x5000 with all the variations between them. Also the text has different font sizes.

1st : what is your suggestions to the configuration file and how many itterations ?

2nd: I trained width and height 704x704 up to 40k , best model was at 16.5k

testing using same dimensions: it gave me acceptable result but it still have some bounding problems

testing using 1088x1088: it gave me more accurate pictures with the large images with small font

testing using 480x480: it gave me better results with large images that has alot of text that reach the end of page . "this is problem because anchors doesn't reach the end of page " so when we change the resolution to 480 and with anchors of 704 , so the anchors could reach the end of image so it detected all the text but it the small fonts it missed it

How could i generalize such a problem ? i use 15 anchor ? What can be missing to be able to detect 1 class "text boxes" perfectly using yolo ?

rmmal commented 7 years ago

Also is there is any major difference between your fork and the original project ?

AlexeyAB commented 7 years ago

@rmmal Hi,

There are no strong differences until Joseph released a new version of the Yolo. But there are some modifications, original fork has ~+1 mAP, but worked badly with non-square network- I do not know if he has fixed this.

Do you use this fork or original?
Did you train Yolo with param random=1?
Do you need to detect each line of text as single object or whole text-page as single object?
On which network resolution did you get the best result? using 1088x1088?
How did you calculate your 15 anchors (and did you change num=15 and set anchors=<30 values>)?

Try to use for detection 1088x1088 and multiply each anchor value by 1.6 (but if you trained with random=1, then multiple by 2.4)

rmmal commented 7 years ago

1- I guess i used the original project , do you recommend me to work on urs ? 2- No i trained it with fixed size 704x704 3- i need to detect text blocks , each block in the image as a single object 4- there is no best resolution yet , testing with 704x704 gave me acceptable results at some images , poor at others ( medium result ) same as 480x480 (this helped me in long text blocks detection ) , same as 1088x1088 ( helped me in small text font with images beside it as it avoid the image and make smaller blocks ) 5- i calculated the images using K-means in this github project: https://github.com/Jumabek/darknet_scripts and yes i changed num=15 and set the 30 value in anchors and changed filters to 90

you mean the anchors used in 704x704 , multiply it by 1.6 ? and why if random =1 , i should multiply by 2.4 ?

For illustration , the model used in training 704x704 tested with :

480x480 06-01_480

704x704 06-01_704

1088x1088 06-01_1088

480x480 addostour_781725360_480

704x704 addostour_781725360_704

1088x1088 addostour_781725360_1088

480x480 0009_480

704x704 0009_704

1088x1088 0009_1088

AlexeyAB commented 7 years ago

As I see 1088x1088 can detect more text blocks, but it has problems with anchors (size of blocks).

Try to calculate anchors for 1088x1088: https://github.com/Jumabek/darknet_scripts/blob/master/gen_anchors.py#L17

Then train for 1088x1088 and detect using 1088x1088.
And also you can try to train Densenet201-yolo with resolution 1088x1088. It is Yolo v2 based on DenseNet201 classification network, that can detect both very small and very large objects: https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/densenet201_yolo.cfg but I have not tested it enough yet.
- cfg file: https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/densenet201_yolo.cfg
- pre-trained weights densenet201.300 https://drive.google.com/open?id=0BwRgzHpNbsWBOFpORVB4UUtfT0U

rmmal commented 7 years ago

how can i adjust the number of filters ? when i put the 15 anchor ( num=15) and filters = 90 it gave me error: l.outputs == params.inputs

so how it's calculated ?

Also when i try to run your cfg file it gave me error : 14 (null): 100% (null): 100% x: 0.076923, y: -0.140000, w: inf, h: 0.000000 Segmentation fault (core dumped)

and when i run the densenet201.cfg which is the default one with yolo it works fine , but it didn't give any results .

so what could be the problem ?

AlexeyAB commented 7 years ago

how can i adjust the number of filters ? when i put the 15 anchor ( num=15) and filters = 90 it gave me error: l.outputs == params.inputs

so how it's calculated ?

But you said that you done this and it worked fine:

5- i calculated the images using K-means in this github project: https://github.com/Jumabek/darknet_scripts and yes i changed num=15 and set the 30 value in anchors and changed filters to 90

Also when i try to run your cfg file it gave me error : 14 (null): 100% (null): 100% x: 0.076923, y: -0.140000, w: inf, h: 0.000000 Segmentation fault (core dumped)

and when i run the densenet201.cfg which is the default one with yolo it works fine , but it didn't give any results .

Is this about densenet201_yolo.cfg? What command line did you use to run it?
And what command line did you use to run densenet201.cfg?

rmmal commented 7 years ago

yes it was my fault , i forget to edit #classes to 1 so now i started to train it and i will wait for the result , do you see how many epochs will be fine ?

both: ./darknet detector train data/obj.data cfg/densenet201.cfg model_name

and for testing i changed train with test

btw all the testing and training working fine , i will try to do like what you've said and see the resutls.

thanks @AlexeyAB

AlexeyAB commented 7 years ago

So, try to train both yolo-voc2.0.cfg (or yolo-voc.cfg) and densenet201_yolo.cfg with resolution 1088x1088 and 15 anchors calculated for 1088x1088.

rmmal commented 7 years ago

Okay , started training densenet201_yolo.cfg with batch 32 , subdivision 32 and yolo-voc2.0.cfg with batch 16 and subdivision 8

i will wait and see

rmmal commented 7 years ago

@AlexeyAB for now , the yolo-voc2.0.cfg finished 8500 and the loss was ~50 which is so big i don't know why . so when i test it on these same pictures nothing appear ( no text boxes ) appears , do you know what's the problem now ?

also i noticed when i trained my first model (704x704) , the small pictures like 300x100 or 150x150 or anything smaller than 500 , no detection appears . why this is happening ?

AlexeyAB commented 7 years ago

@rmmal

for now , the yolo-voc2.0.cfg finished 8500 and the loss was ~50 which is so big i don't know why . so when i test it on these same pictures nothing appear ( no text boxes ) appears , do you know what's the problem now ?

It seems that something is wrong in the cfg file. Check, did you use correct trained weights? Did you calculate anchors for 1088x1088? So if everything is correct, but the result is bad, then try to use yolo-voc.cfg/yolo-cfg instead of yolo-voc.2.0.cfg

also i noticed when i trained my first model (704x704) , the small pictures like 300x100 or 150x150 or anything smaller than 500 , no detection appears . why this is happening ?

Yes, this problem happens when the image is smaller than the network size. Has your training dataset images with these sizes 300x100 or 150x150? Do you test detection on the same images as for training, or on other?

rmmal commented 7 years ago

yes i calculated the anchors for 1088x1088 , yes i used the correct trained weights okay i will try it it too.

yes in my training datasets i have these small sizes . in testing i used another test images with variable sizes including small and big , but nothing appears in all

rmmal commented 7 years ago

also there is something that make the loss increase, steps=100,25000,35000 scales = 10,0.1,0.1

when the learning rate increases after the first 100 epoch , the lost starts to increase rapidly , sometimes in other experiments give me NAN sometimes it increase from 25 up to 75 , why this is happening and should i change anything ?

AlexeyAB commented 7 years ago

So in your case try to set:

learning_rate=0.0001
steps=8000,10000
scales = 0.1,0.1

AlexeyAB / darknet

Custom dataset #199