AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

Deeper Training Questions/Theory #377

Open SJRogue opened 6 years ago

SJRogue commented 6 years ago

Hello Alexey, anybody,

Thanks for your previous answers. Hopefully you can help.

I have a bigger question this time. I'm working with +/- 20.000 classes, for simplicity let's say 500..

Given the following challenge (picture 1 below) and the following solution (picture 2 below):

Question 1 I have to be able to see the difference between 1A-1B-1C (3 different objects) but I also have to be able to see the difference between 2A-2B-2C (3 different objects but a different difference).. [I've marked with a blue line so you can see what the different difference is]

I have to be able to see that 3A-3B-3C (the same object) are exactly the same object.

Can this be done by darknet/yolo? Can this all be done in 1 same model? I am thinking this is possible. My solution would be to train using the template backgrounds (second picture) and then use the chosen background in production environment.

Question 2

Which background would be best to train on from 1-10 ?

Ofcourse in the production environment i will then use the same background as the one used to train. Ofcourse in training I will try to make around 1000-2000 pictures per class on the same background.

Question 3

Does color or color contrast affect /make a difference for the backgrounds? Should I use black background with white lines or just grey/wood.

Question 4

Do I use softmaxing to Object Detect between superclasses (i.e. seperate Knife vs Scissor) and then Image Classify between subclasses (i.e. seperate Knife 1 vs Knife 2)?

challenge

template

AlexeyAB commented 6 years ago

@SJRogue Hi,

  1. Yes it can be done using 1 model. Your training dataset should contain each object with all: scales, rotations, lightings, from different sides - from which yolo should detect objects.
jitter=0.3
random=1

  1. I think better to use background-1.

Other backrounds can be used only if you want to know real size of object (not only proportions), but:


  1. It is important that the object does not merge with the background. You should clearly see the outline of the object.

  1. In general, in this case better to use yolo9000.cfg with softmax-tree (superclasses and subclasses) https://github.com/AlexeyAB/darknet#using-yolo9000 but this model is more difficult to train. You can try to train both and compare results:

Based on:

SJRogue commented 6 years ago

Alexey.. Big thank you.

I need time to respond with more questions but for now:


Q1: For detection I can determine scale, resolution, shooting point (distance, angle). I can not always control lighting, every environment will be different.

For me, if shape + size + proportions are the same, then color does not matter, the object is the same object. In casu: I mean that a scissor with a red handle = scissor with a blue handle if the shape and the real life size/proportion is the same.


Q2: In detection: I can not determine rotation of the object placed on the background by the users. They will not always place instruments precise.


Q3: Users will place one object at a time (in version 1..). So I'm thinking I need a background with a high contrast (background color vs lines/grid/circle colors) and I need to pick 2 colors which are never blending with the instruments.

Or.. maybe as you say.. since I can control distance, shooting point, angle.. I can take background 1.. (you sure this not a problem for determining size/proportions?)


Q4:

Exactly! I think I need 9k because if I do this without softmaxing it's going to really be a problem to categorize and differentiate.


AlexeyAB commented 6 years ago
  1. So use in your cfg-file: in the [net]
    saturation = 1.5
    exposure = 1.5
    hue=.1

in the [region_layer] (jitter=0 or 0.05)

jitter=0
random=0
  1. So, if you can not determine rotation of the object placed, then your training dataset should contain every possible rotation for each object (that are possible on detection-stage).

  2. background-1 isn't problem for proportions. And it isn't problem for size (if scale and shooting point the same). It can be problem, if some of objects have the same color as backgorund-1.

  3. Yes, Softmax is used in any cases: yoloV2 and yolo9000, also yolo9000 has softmax-tree, so it will use different softmax for each subclass - this is very suitable for classifying a large number of classes.

SJRogue commented 6 years ago

Thanks, Alexey.

For taking the time and understanding the environment I'm putting down. I think you gave me what I need to know.

I will have questions about

I will refer to you, no doubt, in my graduation project : )

sivagnanamn commented 6 years ago

Interesting application :+1: Good luck for your project.

SJRogue commented 6 years ago

@sivagnanamn

Thank you for the support, its going to take trial and error. Whatever results I book I will reach back.