annotation refinement (former: material variants of objects)

ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.

https://imagemonkey.io

47 stars 10 forks source link

annotation refinement (former: material variants of objects) #178

Open dobkeratops opened 6 years ago

dobkeratops commented 6 years ago

I gather there's an idea for material prefixes of some sort, but could we add some common cases to the label list to enable annotating these sooner rather than later.. and map them back to the material system later. This could happen in the label graph ,e.g we could have nodes like "metal object"->{"metal door","metal gate","metal pipe"},"brick object"->{"brick wall","brick building"}.

brick building
glass building
concrete building
wooden door
metal door
glass door
wooden fence
metal fence
- wire fence
- chainlink fence (very distinctive type of diagonal grid metallic fence)
- metal railing (fence specifically made of many vertical parallel posts)
wooden fence post (often like a bollard with a chain attached)
metal fence post
brick wall
stone wall
concrete wall
cinder block wall or breeze block wall https://en.wikipedia.org/wiki/Concrete_masonry_unit (etc UK/US english difference),or perhaps - - - concrete block wall as a universal term
asphalt path
dirt path
metal gate (most common)
wooden gate (rural areas)
metal frame
wooden frame
metal pipe
plastic pipe
metal girder
metal beam
wooden beam
plastic barrier
metal barrier
concrete barrier
wooden barrier
metal bollard
wooden bollard
concrete bollard
traffic bollard (brightly coloured bollard with embedded road sign)
metal barrel
wooden barrel (a few in the stock photos)
stone pillar
wooden pillar
brick pillar
concrete pillar
metal pole -wooden pole

it's very quick to type these in add labels.. what worries me is the notion of annotating then having to go back later - it would probably be more efficient to take the hit of a longer label list now, then add forms of management over the top.

what i'm hoping is this will multiply the training value i.e. even though you might not have many examples of each permutation, you're still accumulating many examples of wood vs metal, fence vs wall .. (texture recognition)

explicit names could still be useful because sometimes the name isn't a simple prefix, e.g. it's often more natural to say "wooden.." rather than "wood", both are correct though .. this is probably a case where language is inconsistent and it just depends how the words sound )

(I guess eventually the label graph itself could be generated from a different peice of json where an object lists it's potential material variants, which could also auto generate a master node for each material.. this would be pretty easy to mess with offline in python)

(above list in plaintext for easier cut/paste:-

brick building glass building concrete building wooden door metal door glass door wooden fence metal fence wire fence chainlink fence metal railing wooden fence post metal fence post brick wall stone wall concrete wall cinder block wall breeze block wall concrete block wall asphalt path dirt path metal gate wooden gate metal frame wooden frame metal pipe plastic pipe metal girder metal beam wooden beam plastic barrier metal barrier concrete barrier wooden barrier metal bollard wooden bollard concrete bollard traffic bollard metal barrel wooden barrel stone pillar wooden pillar brick pillar concrete pillar

)

bbernhard commented 6 years ago

totally agreed; that's one of the (bigger) topics I want to tackle next. At the moment I am not sure which direction we should go regarding the "adjectives/properties" (is there a better name for that?) system. I think there are a quite a few possibilites which each have its ups - and downs:

no property system - use labels to express properties: That's basically what we have now. If we want to express that an object has a property we prefix the label with it (e.q: wooden fence).
attach properties to individual annotations imagine a view where one can add individual properties (color, dimension, material...) to existing annotations.
mixed approach: basically a mixture of the two approaches above

I guess the cleanest approach would be the second one. It also has the advantage that we wouldn't need to do duplicate work (e.q: if fence is already annotated, we can just add material:wooden)

But I also understand that this would be a pretty restrictive measure - which could break the flow when annotating (I guess sometimes it's faster to annotate wooden fence, instead of adding material:wooden to an existing fence annotation, simply because you are already in the flow).

The problem I see with the first approach is, that we can easily end up with some really long labels (e.q: red sports car with ...).

I think the most natural and flexible approach would be the third one. But while this one gives the user the most freedom, I guess it could be really hard to query the data again. Because now we won't have the clear separation of labels and adjectives/properties anymore. Imagine one wants to query the dataset for: give me all the images where an elderly woman with grey hair is shown

If we give the user the freedom to enrich labels with adjectives, we could end up with the following labels:

woman grey haired woman grey haired asian woman grey haired european woman old grey haired european woman young grey haired european woman (saw some teens lately with grey hair) ...

I think all those combinations make it really hard to search for the above expression. (of course one could create an "or expression" (woman | grey haired woman | grey haired asian woman...) of all the labels, but I would say that's far from being ideal).

I think in terms of querability the second approach is still the best...but that comes with a price (we have to be more restrictive). But I think being more restrictive here also opens a lot of doors (e.q: use of natural language to query the dataset; speech-to-text to query the dataset...), but that also means that we would need to have the data in a structured, well defined format.

I am thinking quite a while about this now and I really hoped that by now it would be more clear to me which direction we should follow, but to be honest I have no clue ;). My prefered way would still be the second approach (I think that could open a lot of possibilities), but I am a bit worried that it kills the flow and nobody will want to label/annotate stuff because it's too restrictive and cumbersome -> there is no need to have the data well structured if nobody produces data. On the other hand, it's equally bad to have lots of data, but you can't query the data you are interested in.

Secretly, I am hoping, that if we offer the possibility to query the dataset for complex expressions that this could be the "unique selling point" that separates us from other (open source) datasets.

I think we have now reached a point where we should decide which direction we wanna go.

bbernhard commented 6 years ago

short update: I thought about it a bit more yesterday and decided to start with a small prototype to evaluate whether that works that I have in mind. I'll limit the time (max 2-3 weeks) I spend to crank out the prototype - mostly to avoid that lose myself in unimportant details.

The prototype will again be based on the browse based approach (yeah, I really like the concept :D). It lets you search for any label expression and it will list all annotations that match the search query. e.q: dog will show you all dog annotations. You can now scroll through all the annotations and add labels that refines the annotation (in case of dog possible refinements could be: breed, size, fur color, ...)

In case of the label car, possible annotation refinements could be: brand, size, color,...

Furthermore it should be possible to search for missing labels: e.q: dog & ~breed - "show me all dog annotations where the breed is unknown". I guess that could be a pretty interesting way for domain experts to contribute. You know dogs really well, but don't want to annotate images? Why not help us to determine dog breeds?

Not sure if this works out (I have to admit, it's pretty restrictive), but I want to give that I try first. While being restrictive, I think it has the huge advantage that we end up with data that's well structured and can be easily parsed and post-processed by machines. As the time spent on the prototype is limited, I guess we won't lose much, in case it doesn't work out as expected.

dobkeratops commented 6 years ago

" data that's well structured and can be easily parsed and post-processed by machines"

yeah what i've sometimes done is used the x/y idea speculatively for materials, but i've also sometimes added x y labels too, unsure which would be easier to adapt. in x/y, the / itself is ambiguous (does it mean blend or material prefix e.g. 'glass..', is it awkward to seperate from it's use as a part.. or is it easy enough to just assume certain prefixes like metal, wood etc are definitely not part names), wheras with x y the label-list might explode (nonetheless, could it be compared with auto-generated permutations to map it back?)

dobkeratops commented 6 years ago

right regarding queriability, I was hoping eventually the label graph would broaden the inquiry e.g. searching for vegetation will get all the instances of tree,grass,tree trunk,bush etc; and similarly all person variations would be found by person. the graph has the nice potential to get arbitrary similarities, e.g. "search for all the flying objects" (=bird|airliner|helicopter|kite|...),". the label graph could also inform a refinement process very well, e.g. starting from multiple entry points, and doing things like flying_object&animal to narrow down a set (to bird, flying insect, excluding the rest of the flying objects and the rest of the animals)

but you raise a good point about the potential of general purpose adjectives (which would be awesome!) which would make a raw label list explode exponentially.

I wonder if a mixed approach could give us a bit of leeway to expand the detail a bit earlier (e.g. like we've got man,woman nodes now), but then translate those labels into a general queryable markup (e.g. {man,woman} gets turned into {(male,adult):person,(female,adult):person}, with all the permutations of(male|female),(child|adult|elderly):person``` available

bbernhard commented 6 years ago

right regarding queriability, I was hoping eventually the label graph would broaden the inquiry e.g. searching for vegetation will get all the instances of tree,grass,tree trunk,bush etc; and similarly all person variations would be found by person.

that's more or less already supported by the label graph. When clicking on a parent node, it traverses the tree and creates an or-expression of all the children. (e.q: in case of the label vehicle, the query would look like: car | bike | van | bus ....

For me the biggest drawback with the current implementation is:

every label automatically results in a new annotation task: If we already have all the person annotations, I think it would be way faster to just mark all the woman and man, instead of drawing bounding boxes around the those again.

For some simple images it might theoretically be possible "to link" the label man to the label person in the labels view. That way we could tell the system, that the person (where we most probably already have a annotation) is actually a man. That way we wouldn't need to create a new annotation task. But that would only be possible if all people in the image are man. If there are woman, man and children in the picture we can't do that anymore.

I wonder if a mixed approach could give us a bit of leeway to expand the detail a bit earlier (e.g. like we've got man,woman nodes now), but then translate those labels into a general queryable markup (e.g. {man,woman} gets turned into {(male,adult):person,(female,adult):person}, with all the permutations of(male|female),(child|adult|elderly):person``` available

I guess that could work, given that people aren't packing too much information into the labels. As soon as labels get too specific (old grey haired european woman) I think that could be pretty cumbersome.

bbernhard commented 6 years ago

another cool thing would be to add the possibility to describe a scene with natural language. This can be done either on a per-image basis (or a per-annotation-basis). e.q: "girl dancing in the rain", "dog sitting on the ground"... With fuzzy matching I think that could a nice way to query the database.

possible real world use-case: a neural net that describes the image's scene to a blind person.

dobkeratops commented 6 years ago

possible real world use-case: a neural net that describes the image's scene to a blind person.

that's a really cool use case for machine vision , and i think there's some youtube videos showing attempts. even for sighted people you could have an audio summary of things you're not yet looking at.

bbernhard commented 6 years ago

I have been working on the annotation refinement for the past few days and I think it's now in a state where I can show a small demo.

Basically it's intended to work like this:

Similar to the labels.json file, there is now also a json file (see https://github.com/bbernhard/imagemonkey-core/blob/annotation_refinment/wordlists/en/label-refinements.json) that defines the labels that can be added at annotation level. As you can see, the file contains labels that can be used to refine almost every annotation (e.q color, material) and labels that are pretty specific (e.q gender) that only make sense for specific annotation refinements.

Currently, all the labels that are defined in the label-refinements.json file are shown for every annotation refinement. I think for now that is fine, but later I guess it makes sense to show certain refinements only for specific annotation refinements (e.q: it probably makese sense to show gender only in case we are refining a person annotation).

Here's a small demo that shows the new annotation refinement mode in action: annotation_refinement

In my opinion the most powerful feature of this mode is, that you can search for missing refinements. e.q: person & ~gender shows all the person annotations where we do not know whether the person is male or female.

a few other possible search queries:

apple & ~color: show me all apple annotations where we do not know the apple's color
car & ~brand: show me all the car annotations where we do not know the brand ...

Once we have added some refinements to annotations we can query them again:

e.q: person & gender='female' returns all the person annotations where gender = female

I think this could be a great way to enrich annotations with more details, while still being able to query the information again.

The question is now, do we want that? And if we want that, what do we do with labels like blue sky, metal fence...? I think the "most correct" way then would be to rename those labels to sky and fence and use the refinement mode to add blue and metal to the corresponding annotations. Would that be okay or is that too restrictive?

Theoretically we could also support both methods: We could allow the users to refine fence annotations and allow the label <material> fence labels. But I am a bit worried that this will later become a pain in the a**, both for annotators and for dataset users.

For dataset users the query would become more complicated then: Instead of fence & material='metal' they would now need to run (fence & material='metal') | metal fence to get all metal fence annotations. And for annotators it's equally bad, as they might add the metal fence (which results in a new annotation task), although there is already a fence label that could be refined easily.

So, if we decide that we want to have annotation refinement, I would really prefer to keep things simple and straightforward. (i.e use only base labels and refine in the annotation refinement mode). That's for sure more limiting and restrictive, but I think it helps us to better control the quality of the dataset. If we later realize that it's too restrictive and not fun anymore, we can still brainstorm whether there is a way to both allow specific labels and annotation refinement in a controlled way.

dobkeratops commented 6 years ago

that looks really interesting, and it's great it builds on the search feature. it's definitely proved very useful just with simple labels.

The question is now, do we want that? And if we want that, what do we do with labels like blue sky, metal fence...? I think the "most correct" way then would be to rename those labels to sky and fence and use the refinement mode to add blue and metal to the corresponding annotations. Would that be okay or is that too restrictive?

I'm definitely sold on it storing things in a way that's as machine-processable as possible. Do you imagine this translation being automatic? would it be feasible to keep some common expressions as nodes that can sit in the graph, or would it be possible to generate some, e.g. looking for the "

" (there might be some ambiguities with natural language..)

regarding skies I was indeed trying to say 'overcast sky','blue sky','night sky', but there's 2 more conditions that i couldn't quite express that way.. 'sunset' ('a sunset sky' doesn't sound quite right.. it's 'a sunset'), and a 'blue sky thats a bit cloudy' (I was sort of hoping label blending would cover that but there could be a distinct state. the ones i've labeled as 'blue sky' could be even more unambiguously called "clear blue sky" ..)

(at present just try searching: "blue sky","overcast sky","sunset","night sky" gives you each contrasting sky type.. 'clouds' gets a mixture because it could refer to either a blue sky with clouds, or an overcast sky , where the entire sky is clouds)

Most of the time that I was adding 'metal fence'&'fence' .. the more specific label is the 'real' one. There was a point where I didn't realise all labels are searchable (which has been very useful aswell)

bbernhard commented 6 years ago

Do you imagine this translation being automatic?

I haven't thought about it much, but I think the simplest thing is doing that while making the trending labels productive. So whenever I stumble accross a label that's too specific, I would rename it to something less specific.

some examples: metal gate -> gate, blue sky -> sky, 1 car -> car (I think with the annotation refinement we should now finally also be able to specify whether it's one/many/...).

But before doing that, I wanted to talk to you, if that's also okay for you (you've created the majority of labels and I don't want to "destroy" your work ;))

would it be feasible to keep some common expressions as nodes that can sit in the graph, or would it be possible to generate some

not sure, if this is what you meant, but I could imagine that we have a mechanism where we can define aliases. e.q: metal fence -> metal & material='fence'. Those aliases can be then be used in search expressions or the label graph (mainly for convenience)

regarding skies I was indeed trying to say 'overcast sky','blue sky','night sky', but there's 2 more conditions that i couldn't quite express that way.. 'sunset' ('a sunset sky' doesn't sound quite right.. it's 'a sunset'), and a 'blue sky thats a bit cloudy' (I was sort of hoping label blending would cover that but there could be a distinct state. the ones i've labeled as 'blue sky' could be even more unambiguously called "clear blue sky" ..)

(at present just try searching: "blue sky","overcast sky","sunset","night sky" gives you each contrasting sky type.. 'clouds' gets a mixture because it could refer to either a blue sky with clouds, or an overcast sky , where the entire sky is clouds)

very nice!

I am wondering, if the annotation refinement could also be helpful here. I could imagine that we have specific sky refinement labels that only show up when one refines a sky annotation. There you could specify whether it's a blue sky, overcast sky, sunset, night sky... I think there are probably still some edge cases where we can't tell for sure what it is, but I think for the majority of cases it should work.

dobkeratops commented 6 years ago

not sure, if this is what you meant, but I could imagine that we have a mechanism where we can define aliases. e.q: metal fence -> metal & material='fence'.

yes aliases is exactly what I have in mind.

it seems some combinations have natural language shortcuts; an aliases system would allow all of these, plus translating the common ones from handy single labels i guess:-

object    |age=young     |age=adult   |gender=male  |gender=female
----------+--------------+------------+-------------+--------------
person    |child         |            |man          |woman 
cat       |kitten        |            |             |
dog       |puppy         |            |             |
horse     |foal          |            |             |mare
cow       |calf          |            |bull         |
sheep     |              |            |             |ewe

so puppy -> dog&age=young bull->cow&gender=male mare->horse&gender=female etc

it could be really handy if it filled out graph nodes (e.g. start with the node 'female' and you could find 'mare, woman,ewe'), especially for more obscure objects (e.g. if you can see something is made of metal, but you dont quite know what it is.. starting with a 'metal' node would give connections to all the known objects with material=metal .. metal fork, metal gate, ..)

dobkeratops commented 6 years ago

1 car -> car (I think with the annotation refinement we should now finally also be able to specify whether it's one/many/...)

right I imagine this is another dimension.. would it fit in this attributes system? it would be pretty cool if you could universally search for 'images with one thing'

would you imagine this implementing the plural flag?

the places I was saying '1 car' etc was really referring to the whole image, but i suppose you could make a count attribute and indeed specify each annotation within an image as being 'one' or 'many' (.. so a busy traffic scene would have a blob in the distance that is 'many car', whilst the closer ones are '1 car')

I dont mind losing those particular labels ("1 car" etc) - they were added as speculative suggestions. getting the broader attribute system right is more important

bbernhard commented 6 years ago

so puppy -> dog&age=young bull->cow&gender=male mare->horse&gender=female etc

it could be really handy if it filled out graph nodes (e.g. start with the node 'female' and you could fine 'mare, woman,ewe', especially for more obscure objects (e.g. if you can see something is made of metal, but you dont quite know what it is.. finding a 'metal' node would give connections to all the known objects with material=metal)

really cool idea!

right I imagine this is another dimension.. would it fit in this attributes system? it would be pretty cool if you could universally search for 'images with one thing'

would you imagine this implementing the plural flag?

yeah, I have thought about those cases where it's not possible to annotate a single instance (e.q many overlapping trees). In that case, on could still use the base label tree for annotation, and then add the attribute many to mark that it's not a single instance; I guess we could also expose that option directly in the annotation view, so that it's possible to mark the annotation already there with one, many (some sort of shortcut to avoid that one has to go to the refinement mode to do that).

Of course it's also possible to implement a "real" plural mode with plural labels (trees, cars..), but I think it's more time intensive to curate such a list and keep it up to date. I think a simple "plural attribute" would be much easier...and if there is demand for plural labels we could simply create aliases (e.q: cars -> `car & ~number='one').

the places I was saying '1 car' etc was really referring to the whole image

aaah, nice - that's also an interesting piece of information. Unfortunately, the current attribute system doesn't support that (yet). But if it turns out, that the attribute system is promising, we could easily extend it, to also support attributes on a per-image basis.

I dont mind losing those particular labels ("1 car" etc) - they were added as speculative suggestions. getting the broader attribute system right is more important

awesome!

bbernhard commented 6 years ago

I've been experimenting a bit with a bulk-refinement mode lately.

Here's a small demo:

smart_refinement

It basically works like this:

you first specify the label expression (person & ~gender -> show me all person annotations where the gender is unknown)
and then you select the label for refinement (in my case male)
then you scroll through the images and select all the ones that should be refined with the selected label
after you are done, press the "Done" button to persist the changes

I think I have to play a bit with the colored outlines (with the blue and the red outlines it gets a bit confusing)

Not sure if it's a good idea or not..but maybe we can use that for further brainstorming ;)

dobkeratops commented 6 years ago

this is a great idea,IMO. working in the browser view like that is perfect. I bet that bulk-edit mode will have other uses (i bet it would be a great way to validate too). I look forward to trying it out. the only fly in the ointment will be some scenes with different instances of the same label, but you can skip those.

Sometimes I've tried to consciously seperate those as different polygons (e.g. "pavement" which could be"pavement&material=paving stones","pavement&material=cobblestone", etc) .. but not always. in those cases we can just re-annotate the unusual parts.

bbernhard commented 6 years ago

this is a great idea,IMO. working in the browser view like that is perfect. I bet that bulk-edit mode will have other uses (i bet it would be a great way to validate too).

great idea!

Sometimes I've tried to consciously seperate those as different polygons (e.g. "pavement" which could be"pavement&material=paving stones","pavement&material=cobblestone", etc) .. but not always. in those cases we can just re-annotate the unusual parts.

exactly :)

As you've mentioned the re-annotate: At the moment I am still not sure how we should deal with annotation refinements in case someone reworks the underlying annotation. Right now, you would "lose" the annotation refinements, in case you rework the annotation. (The refinements are not actually lost, they are just attached to the older revision of the annotation).

I have this vague idea in my head of a "migration" step that allows you to take all the refinements that were defined for a specific annotation revision and attach them to the new annotation. Maybe some drag and drop: "take all these refinements that were defined for this bounding box and attach them to this polyon "

bbernhard commented 6 years ago

@dobkeratops

As of yesterday, the annotation refinement is now productive. It's the absolute minimum version - I think there are still a bunch of UIX and speed improvements possible. (I didn't want to invest much time for the first version - in case it's not what we wanted, we can throw it away and start over).

Together with the annotation refinement, I also pushed a bunch of small improvements:

the number of results should now be visible in all the browse modes (see #180)
It's now possible to show all the existing annotations in the label mode (https://imagemonkey.io/label). You can do that via "alt + a" key combination (i'll expose that in the UI later) - unfortunately, I realized, that the "alt + a" is already used in Firefox - in Chrome it works. So I am looking for a better key combination - any suggestions are welcome ;) see #188

dobkeratops commented 6 years ago

interesting, I just tried out 'alt-a' - it's great to see that. I look forward to seeing how the attributes/refinement system will evolve (thanks also for the extra labels recently). no preference for a specific hotkey, i guess you just want to dodge common keys with other meaning like 'ctrl-x/c/v' etc. the number is useful, it's great to know the relative numbers of each annotation, and to see progress quantified