Open rmmal opened 7 years ago
Also is there is any major difference between your fork and the original project ?
@rmmal Hi,
There are no strong differences until Joseph released a new version of the Yolo. But there are some modifications, original fork has ~+1 mAP, but worked badly with non-square network- I do not know if he has fixed this.
random=1
?Try to use for detection 1088x1088 and multiply each anchor value by 1.6 (but if you trained with random=1, then multiple by 2.4)
1- I guess i used the original project , do you recommend me to work on urs ? 2- No i trained it with fixed size 704x704 3- i need to detect text blocks , each block in the image as a single object 4- there is no best resolution yet , testing with 704x704 gave me acceptable results at some images , poor at others ( medium result ) same as 480x480 (this helped me in long text blocks detection ) , same as 1088x1088 ( helped me in small text font with images beside it as it avoid the image and make smaller blocks ) 5- i calculated the images using K-means in this github project: https://github.com/Jumabek/darknet_scripts and yes i changed num=15 and set the 30 value in anchors and changed filters to 90
you mean the anchors used in 704x704 , multiply it by 1.6 ? and why if random =1 , i should multiply by 2.4 ?
For illustration , the model used in training 704x704 tested with :
480x480
704x704
1088x1088
480x480
704x704
1088x1088
480x480
704x704
1088x1088
As I see 1088x1088 can detect more text blocks, but it has problems with anchors (size of blocks).
Try to calculate anchors for 1088x1088: https://github.com/Jumabek/darknet_scripts/blob/master/gen_anchors.py#L17
Then train for 1088x1088 and detect using 1088x1088.
And also you can try to train Densenet201-yolo with resolution 1088x1088. It is Yolo v2 based on DenseNet201 classification network, that can detect both very small and very large objects: https://github.com/AlexeyAB/darknet/blob/master/build/darknet/x64/densenet201_yolo.cfg but I have not tested it enough yet.
how can i adjust the number of filters ? when i put the 15 anchor ( num=15) and filters = 90 it gave me error: l.outputs == params.inputs
so how it's calculated ?
Also when i try to run your cfg file it gave me error : 14 (null): 100% (null): 100% x: 0.076923, y: -0.140000, w: inf, h: 0.000000 Segmentation fault (core dumped)
and when i run the densenet201.cfg which is the default one with yolo it works fine , but it didn't give any results .
so what could be the problem ?
how can i adjust the number of filters ? when i put the 15 anchor ( num=15) and filters = 90 it gave me error: l.outputs == params.inputs
so how it's calculated ?
5- i calculated the images using K-means in this github project: https://github.com/Jumabek/darknet_scripts and yes i changed num=15 and set the 30 value in anchors and changed filters to 90
Also when i try to run your cfg file it gave me error : 14 (null): 100% (null): 100% x: 0.076923, y: -0.140000, w: inf, h: 0.000000 Segmentation fault (core dumped)
and when i run the densenet201.cfg which is the default one with yolo it works fine , but it didn't give any results .
Is this about densenet201_yolo.cfg
?
What command line did you use to run it?
And what command line did you use to run densenet201.cfg
?
yes it was my fault , i forget to edit #classes to 1 so now i started to train it and i will wait for the result , do you see how many epochs will be fine ?
both: ./darknet detector train data/obj.data cfg/densenet201.cfg model_name
and for testing i changed train with test
btw all the testing and training working fine , i will try to do like what you've said and see the resutls.
thanks @AlexeyAB
So, try to train both yolo-voc2.0.cfg
(or yolo-voc.cfg) and densenet201_yolo.cfg
with resolution 1088x1088 and 15 anchors calculated for 1088x1088.
Okay , started training densenet201_yolo.cfg with batch 32 , subdivision 32 and yolo-voc2.0.cfg with batch 16 and subdivision 8
i will wait and see
@AlexeyAB for now , the yolo-voc2.0.cfg finished 8500 and the loss was ~50 which is so big i don't know why . so when i test it on these same pictures nothing appear ( no text boxes ) appears , do you know what's the problem now ?
also i noticed when i trained my first model (704x704) , the small pictures like 300x100 or 150x150 or anything smaller than 500 , no detection appears . why this is happening ?
@rmmal
for now , the yolo-voc2.0.cfg finished 8500 and the loss was ~50 which is so big i don't know why . so when i test it on these same pictures nothing appear ( no text boxes ) appears , do you know what's the problem now ?
It seems that something is wrong in the cfg file. Check, did you use correct trained weights? Did you calculate anchors for 1088x1088?
So if everything is correct, but the result is bad, then try to use yolo-voc.cfg
/yolo-cfg
instead of yolo-voc.2.0.cfg
also i noticed when i trained my first model (704x704) , the small pictures like 300x100 or 150x150 or anything smaller than 500 , no detection appears . why this is happening ?
Yes, this problem happens when the image is smaller than the network size. Has your training dataset images with these sizes 300x100 or 150x150? Do you test detection on the same images as for training, or on other?
yes i calculated the anchors for 1088x1088 , yes i used the correct trained weights okay i will try it it too.
yes in my training datasets i have these small sizes . in testing i used another test images with variable sizes including small and big , but nothing appears in all
also there is something that make the loss increase, steps=100,25000,35000 scales = 10,0.1,0.1
when the learning rate increases after the first 100 epoch , the lost starts to increase rapidly , sometimes in other experiments give me NAN sometimes it increase from 25 up to 75 , why this is happening and should i change anything ?
So in your case try to set:
learning_rate=0.0001
steps=8000,10000
scales = 0.1,0.1
Hello @AlexeyAB
I use yolo in a Text detection problem . My dataset has 13k images with different sizes from 200x150 up to 4000x5000 with all the variations between them. Also the text has different font sizes.
1st : what is your suggestions to the configuration file and how many itterations ?
2nd: I trained width and height 704x704 up to 40k , best model was at 16.5k
testing using same dimensions: it gave me acceptable result but it still have some bounding problems
testing using 1088x1088: it gave me more accurate pictures with the large images with small font
testing using 480x480: it gave me better results with large images that has alot of text that reach the end of page . "this is problem because anchors doesn't reach the end of page " so when we change the resolution to 480 and with anchors of 704 , so the anchors could reach the end of image so it detected all the text but it the small fonts it missed it
How could i generalize such a problem ? i use 15 anchor ? What can be missing to be able to detect 1 class "text boxes" perfectly using yolo ?