Open damiandee opened 6 years ago
Hi,
Also change these lines to this int flip = 0;
:
Otherwise Yolo will detect signs: left-turn and right-turn as the same object.
Wow, didn't even expected so fast reply, really appreciate it!
Please, tell me, should I then label in images objects that are so small, sometimes barely visible in 1920x1080? Or that won't impact on training and I can label them when they're bigger?
If you don't want to detect that small objects, that you can don't label them.
It is very bad if there are unlabelled objects (which you want to detect).
@AlexeyAB
What do you recommend doing about partial images (maybe 25%, 50%, 75% of the image is in frame)? In the case of an image being partially out of frame, if labelled the proportions would be different, I'm not sure if this has any effect on detection. I noticed having many FP's (i.e. unlabeled images which should be TP's) will affect training poorly. Should I black out these partial sections in the source image, label them, or something else?
What about obscured objects that are fully in frame? For example, Class A may have a timestamp from the video over it, or Class A sits partially behind another object of Class B (or even another Class A).
Question about the bounding box - how close to the edge of the object should the box be? Can it bump up against the object or should there be some buffer?
Thanks for your help.
@judwhite
If you can see for yourself what kind of object it is, then label it. Even if you see only 25%.
If you can see for yourself what kind of object it is, then label it.
It does not matter. Do it the way you would like it to be done by the Yolo during detection.
@ralek67 Thanks, that's helpful regarding the bounding box.
Regarding obscured images:
I have sampled frames from video game play. There can be stretches where an object can be under the HUD ("heads-up display"), and while I'm able to tell what it is based on knowledge of the game, and it'd be nice to detect at runtime, it's also acceptable to not detect it. Here's an example (the "spell factory 5" below is even more obscured):
I have plenty of other samples of this class, and while mAP is good (ap = 99.04 %) after 8000 iterations (batch=64, subdivisions=32, height/width=608 for better precision) it pretty regularly fails validation (fails to detect for scenes not part of training) for this class and others which were labeled under the HUD. This particular class also had instances which were partially in frame and labeled. I'm working with the HUD layover piece first then I'll decide what to do about partially in-frame objects.
To deal with this I added a "blackout" feature which removes the bounding box and changes the pixels underneath to black to avoid a potential FP during training. It looks like this:
I'm not sure solid black was the best choice because it'll stand out in edge detectors, maybe a heavy blur would be better; I don't know at this point. My guess is it probably doesn't make much of a difference.
On a side note, I think I may be using a system that's too powerful for the task I'm trying to accomplish. There are papers dating back to (at least) 2004 which show practical detection of near-duplicate images (such as sprite-based games, in this case) on the hardware of that time: http://www.cs.cmu.edu/~rahuls/pub/mm2004-pcasift-rahuls.pdf and more recent work: https://link.springer.com/article/10.1007/s11042-015-2472-1
Other than just rambling I should add something to the conversation:
2. p with labeling! - Anyway, given the problem space and that I would like to infer state
@judwhite In case you could not find the answer already, you should check the Siamese Network. It uses a special error function to measure similarities.
Hello!
1) I've got my custom road signs dataset used in cascade classifier, grouped separately in directories by every sign. I wrote script transforming labels into YOLO format but that made a simple doubt. The point is some frames contains more than 1 road sign types and that frame is included in multiple directories, once with labelled signed 1 and once more with labelled sign 2. Does it have any impact on training process? Will my network learn while taking same picture 2 times with 2 different signs labelled?
2) Frist I thought I should convert my images into gray scale to make training process faster but after reading issues I understand it's unnecesary, right?
Thanks for all your help!