ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

head/man head/woman head/person - what to do about "all" #147

Open dobkeratops opened 6 years ago

dobkeratops commented 6 years ago

consider this scenario: crowd scenes-

you can see many people; it would be great to annotate the heads, and differentiate man from woman.. that gives you 3 labels, head/man head/woman, head/person

however.. you can't easily tell at a glance if you did all of each gender - because some will be distant (or children, babies), and you want to leave them indeterminate (i.e. just "person")

the set (head/man)|(head/woman)|(head/person) would be complete

any ideas on how to handle something like this ?

Doing head/man and head/woman as separate tasks is ok, but then when you do 'head/person' you want to know which ones have already been done as man or woman (it's a catch all for the ones you can't distinguish)

you've got some scenes where it is easy to tell, so it would be nice to be able to annotate those individually .. but then (as with alot in the real world) a lot of grey areas

ideas

I think we'll have a similar scenario for car types - it'll be great to annotate SUV, hatchback, saloon , bus as those get fuzzy in the distance you can only really guarantee all for 'car'

bus vs truck vs van vs car should be ok with the existing system - as these 4 types look quite distinct

bbernhard commented 6 years ago

any ideas on how to handle something like this ?

really good question. I mainly used the "annotate all xxx" phrase to make it clear to the user that they need to mark all occurences they see in one sitting. There is only one annotation task per label - so if someone marks only two instead of the three dogs in the picture, the annotation task is gone.

But we need to change that. At some point in time it will be possible to pick up an incomplete annotation task to complete it.

But of course, that doesn't solve problems like this one:

Doing head/man and head/woman as separate tasks is ok, but then when you do 'head/person' you want to know which ones have already been done as man or woman (it's a catch all for the ones you can't distinguish)

The initial idea was to keep the labels list at a bare minimum and do all the refinment on a per-annotation basis. e.q: no man/woman labels, but just a person label. After all the person in the image are annotated, the image can be refined via quiz mode (e.q: What do you see? [ ] man [ x ] woman [ ] child)

But as we saw, this workflow has some serious disadvantage: someone needs to annotate it first, before it can be refined. While with the current workflow, we immediately profit, if someone adds a new label. So even if we are low on annotations, we get some information just from the fact that the image has a label.

On the other hand, this approach could also lead us into a direction where we end up with a lot of unnecessary annotation tasks. Imagine a picture with a fireworker. The first user sees the picture and adds the label fireworker, then the second one comes along and adds the label man and finally, the third one adds the label person. Internally, we would end up with three annotation tasks: annotate all fireworker, annotate all person, annotate all man for the same object.

While the current workflow has it's advantages, that is in my opinion a huge disadvantage. So what I am asking myself since a while now is, if we should make certain labels un-annotatable (annotatable: false) per default.

Just a thought experiment, but what if we make labels like man and woman non-annotatable per default. If someone adds such a label, we could automatically add the label person which is the only one that's annotatable. The information whether it's a man or woman need to be specified on a per-annotation basis (either via quiz or annotation refinement mode).

That way we could have both: You could add a specific label (e.q man) to the image, which results in immediate benefit, but it also lets us keep the annotation tasks at a minimum.

I know, that's a pretty radical step...and I am not even sure if it's the best way to solve this problem. (definitely open for discussion!) But I have the feeling that this little tweak could make the system more efficient. Especially, as there aren't many users that are using the system yet.

Or did I miss a point and there is actually a benefit in annotating an object more than once?

dobkeratops commented 6 years ago

Or did I miss a point and there is actually a benefit in annotating an object more than once?

I wouldn't say it's an advantage, and if you can steer people away from repeat work, that's much better. I'd just say if a more flexible workflow has the hazard of a few % repeated work (say 10 people randomly choosing from 10,000 images), I wouldn't worry about that.

So it seems there are many ways this could be handled... it's a case of figuring out which is the easiest retrofit.

With hotkeys you could easily handle a few related labels in a single sitting ('label all person/ man/woman'), (I'd suggest [,] as label toggle) .. (What if you could do this as an advanced user option, and rely on 'quiz' to refine plain examples?)

The proposed refinement/quiz workflow has the serious advantage that you can expand endless detail (gender,occupation,emotional state, posture, clothing,age) as time goes on. (and of course that applies to vehicles, animals too). that might want a batch mode (e.g. "from these people annotations, click all the 'man'), but a basic single instance version would be a great starting point (and the most detailed cases would be clearly visible with a small number of examples)

With vehicles, I think it's ok to just add truck, bus,van seperately, because it's incorrect to have labelled these as cars.. they really are seperate classes (no good collective name identified yet).

Some cases are simple, e.g. you have a lot of scenes with one man, one woman.. is there a way to identify and pick these out ?

(I was experimentally adding labels such as 'few person' , 'many car' .. prefixes one.., few, .. many.. as hints for strategy)

regarding avoiding repeated work, I think there are many situations where you might like to know about some existing annotations (i.e. seeing them in the current image but drawn faint perhaps). That would make it easier to add the components in successive passes. You could use the label graph to check e.g. if you're doing 'person' and there's already man/woman .. maybe you could warn or recomend "do man/woman first.."

What if there was a whole extra layer of 'individual annotations' i.e freely drawn boxes with individual label choice, and individual validation , which can still be accumulated by the corresponding 'annotate all..'. (i.e. "annotate all: person" could trace the graph and check for individual annotations of 'man','woman').

Could a different form of validation could happen in bulk, sorted for the image? ("do you see anything not annotated..")

is the concept of 'all' more relevant/useful to certain labels?:

I know that with roads, pavements you can then use the rest of the image as a negative, which is great, and with these it is easy enough to do all (area coverage); but with increasingly detailed object identification, the chances of other pixels being that are greatly reduced ("annotate: convertible sports car" - even most of the other traffic, let alone surrounding scenery, will still be a negative..).

There's another suggestion for '2 forms of validation', ie.

perhaps completeness be traced gradually down the label graph, accumulating other annotaions?

bbernhard commented 6 years ago

So it seems there are many ways this could be handled... it's a case of figuring out which is the easiest retrofit.

yeah, right.

The proposed refinement/quiz workflow has the serious advantage that you can expand endless detail (gender,occupation,emotional state, posture, clothing,age) as time goes on. (and of course that applies to vehicles, animals too). that might want a batch mode (e.g. "from these people annotations, click all the 'man'), but a basic single instance version would be a great starting point (and the most detailed cases would be clearly visible with a small number of examples)

totally agreed!

With vehicles, I think it's ok to just add truck, bus,van seperately, because it's incorrect to have labelled these as cars.. they really are seperate classes (no good collective name identified yet).

yeah, right. That for sure makes sense!

My hope is, that we will eventually reach a point, where we have some simple "rules" in place that helps us to decide when to use a label and when a refinement.

Let's take fence for example. Are these still labels or already refinements:

I think for now it doesn't matter much, as we are still in an experimentation phase - so we are still pretty flexible in the direction we are going. But as we progress, it probably gets harder and harder to change that.

I could imagine that there will be an interactive tutorial some day, that guides users through the first steps. If we have some general do's and dont's in place, we could introduce them to those rules that way.

e.q:

I am sure there are cases, we need to bend the rules a bit, but that's okay.

Another problem I see with "being too specific" is, that annotation tasks could get harder. We already have that a bit with man vs woman where it sometimes gets hard to detect the right gender (especially if they are in the image's background).

But I guess that's just one example among many others. Imagine the annotation task annotate all: oldtimer. If there is only one car in the image, you can probably guess. But if there are two cars it's already a bit harder. ("Is that car on the left per definition already an oldtimer or just an old car?"). At that point I probably have to skip the annotation task. Would the annotation task instead have been: annotate all: car, I would be able to do it...and the guy with the oldtimer knowledge would still be able to refine the annotation ("oh yeah, that's a bentley xyz, construction year: xxxx") - so we would both be happy.

There's another suggestion for '2 forms of validation', ie. ...

good idea!

dobkeratops commented 6 years ago

We already have that a bit with man vs woman where it sometimes gets hard to detect the right gender

this is kind of why I think arbitrary label blend would be a nice idea (I hope it isn't offensive to write man/woman though .. imagine confirming "yes it's a person, and yes you can't tell the gender from this distance".) .. because in real perception there will certainly be borderline or distant cases, as well as actual crossover - so the ideal system should tolerate this.

I'm sure this will occur with vehicles too: there is a sliding scale between cars/SUV's (some of those 'compact SUVs' look like beefy hatchbacks to me), vans/trucks, minibus / van, convertible sports car with hardtop looking a lot like coupe's etc..

bicycles .. there's Mountain and Road bikes, but then there's Hybrids/City-bikes, and Cyclocross, and "gravel bikes" (road bikes with 28mm+ tires, not quite CX). I've even seen someone riding with road handlebars on an actual mountain bike.

"Are these still labels or already refinements:"

would it work to have some common cases as shortcuts for refinement? I imagine the label graph will allow it to be flexible