AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.77k stars 7.96k forks source link

Is it bad to have unlabelled objects in images? #615

Open damiandee opened 6 years ago

damiandee commented 6 years ago

Hello!

1) I've got my custom road signs dataset used in cascade classifier, grouped separately in directories by every sign. I wrote script transforming labels into YOLO format but that made a simple doubt. The point is some frames contains more than 1 road sign types and that frame is included in multiple directories, once with labelled signed 1 and once more with labelled sign 2. Does it have any impact on training process? Will my network learn while taking same picture 2 times with 2 different signs labelled?

2) Frist I thought I should convert my images into gray scale to make training process faster but after reading issues I understand it's unnecesary, right?

Thanks for all your help!

AlexeyAB commented 6 years ago

Hi,

  1. It is very bad if there are unlabelled objects (which you want to detect).
  2. Gray scale doesn't accelerate training

Also change these lines to this int flip = 0;:

Otherwise Yolo will detect signs: left-turn and right-turn as the same object.

damiandee commented 6 years ago

Wow, didn't even expected so fast reply, really appreciate it!

Please, tell me, should I then label in images objects that are so small, sometimes barely visible in 1920x1080? Or that won't impact on training and I can label them when they're bigger?

AlexeyAB commented 6 years ago

If you don't want to detect that small objects, that you can don't label them.

judwhite commented 6 years ago

It is very bad if there are unlabelled objects (which you want to detect).

@AlexeyAB

  1. What do you recommend doing about partial images (maybe 25%, 50%, 75% of the image is in frame)? In the case of an image being partially out of frame, if labelled the proportions would be different, I'm not sure if this has any effect on detection. I noticed having many FP's (i.e. unlabeled images which should be TP's) will affect training poorly. Should I black out these partial sections in the source image, label them, or something else?

  2. What about obscured objects that are fully in frame? For example, Class A may have a timestamp from the video over it, or Class A sits partially behind another object of Class B (or even another Class A).

  3. Question about the bounding box - how close to the edge of the object should the box be? Can it bump up against the object or should there be some buffer?

Thanks for your help.

AlexeyAB commented 6 years ago

@judwhite

  1. If you can see for yourself what kind of object it is, then label it. Even if you see only 25%.

  2. If you can see for yourself what kind of object it is, then label it.

  3. It does not matter. Do it the way you would like it to be done by the Yolo during detection.

fvlntn commented 6 years ago
  1. For this part, be sure to be consistent during all your labelling process. I was having this due to dataset being labelled by different people: image
judwhite commented 6 years ago

@ralek67 Thanks, that's helpful regarding the bounding box.

Regarding obscured images:

I have sampled frames from video game play. There can be stretches where an object can be under the HUD ("heads-up display"), and while I'm able to tell what it is based on knowledge of the game, and it'd be nice to detect at runtime, it's also acceptable to not detect it. Here's an example (the "spell factory 5" below is even more obscured):

yolo_obscured

I have plenty of other samples of this class, and while mAP is good (ap = 99.04 %) after 8000 iterations (batch=64, subdivisions=32, height/width=608 for better precision) it pretty regularly fails validation (fails to detect for scenes not part of training) for this class and others which were labeled under the HUD. This particular class also had instances which were partially in frame and labeled. I'm working with the HUD layover piece first then I'll decide what to do about partially in-frame objects.

To deal with this I added a "blackout" feature which removes the bounding box and changes the pixels underneath to black to avoid a potential FP during training. It looks like this:

image

I'm not sure solid black was the best choice because it'll stand out in edge detectors, maybe a heavy blur would be better; I don't know at this point. My guess is it probably doesn't make much of a difference.

On a side note, I think I may be using a system that's too powerful for the task I'm trying to accomplish. There are papers dating back to (at least) 2004 which show practical detection of near-duplicate images (such as sprite-based games, in this case) on the hardware of that time: http://www.cs.cmu.edu/~rahuls/pub/mm2004-pcasift-rahuls.pdf and more recent work: https://link.springer.com/article/10.1007/s11042-015-2472-1

Other than just rambling I should add something to the conversation:

  1. Are there known modern techniques for object detection for objects with a high-degree of similarity?
  2. The goal is world recreation at this stage. I wanted to learn a bit of ML and infer it from raw video/images rather than patching and inspecting the wire protocol (though @iGio90 has substantial prior work on this). After several hours of hand labeling (thankfully now assisted by the trained model) I considered looking at Giovanni's approach just to help with labeling! - Anyway, given the problem space and that I would like to infer state from an image, are there other networks or techniques I should consider?
utkutpcgl commented 4 years ago

2. p with labeling! - Anyway, given the problem space and that I would like to infer state

@judwhite In case you could not find the answer already, you should check the Siamese Network. It uses a special error function to measure similarities.