Open LKneringer opened 5 years ago
I'd wager that although it may not be easier to label a network for multiple classes; it certainly could allow for convergence to happen more quickly with less training images. I find the more consistent I am with training images, I yield the highest MAP possible.
Not to mention that I'd rather "have and not need, then need and not have" the level of granularity to differentiate between skis and snowboards. I could always group the two downstream after objects are identified.
@LKneringer It is enough to label only skiers (only 1 class) in this case. Each final activation of Yolo has large receptieve field, so Yolo will see: person, ski, ski sticks, .... and will automatically take them into account
Thanks for the responses. If I choose to go with a single annotation label, how would I go about images where the ski or ski sticks would significantly extend the boundaries, thus filling the image with lots of negative space?
Would it preferable to crop the boundary to the skier or should I still keep the ski gear in the annotation? I'm not sure whether whatever is in the background could impair the learning process.
@LKneringer
Just label in such a way as you want to detect objects.
Each final activation in Yolo has large receptieve field, it sees ~300x300 - 600x600 pixels. So background affects on the learning process in any cases regardless to annotation way.
Yolo will automatically decide what is the object and what is the background.
I concur. I found that when training, it is best to select the entirety of the object you are training for, this will only add to training of that object with multiple types of backgrounds e.g. dark background, light, similar color, etc.
On that same note: I also select the entire object when part of it is obstructed, at first I thought "well I might end up training the system to recognize that other object (for example my finger when picking up an object in frame) but to the contrary, it only adds to the variance of what that object will look like when picked up, or partially obstructed.
In other words...labeling obstructed objects as well as objects on the edge of an image that are only partially seen will too also add value to your model (as originally commented in Alexey's main instructions).
We'd like to track skiers and snowboards skiing down a slope, but we're not entirely sure what the best approach to annotate the images would be. We do not need to differentiate between the two; it's merely important we can track them as flawlessly as possible.
While doing some research, we found that people, their ski and ski sticks are usually all annotated separately: https://storage.googleapis.com/openimages/web/visualizer/index.html?set=train&type=detection&c=%2Fm%2F071p9
Would this be the best approach, or would it suffice to only annotate the person? If so, would it be best to include the skis and poles in the annotation or should it focus simply on the person itself?