ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

brainstorming: expressions in labels (boolean, comparison) #157

Open dobkeratops opened 6 years ago

dobkeratops commented 6 years ago

extension of #141.. imagine if you could use boolean expressions to amplify the expressivity of a dynamic curated label list (trying to think of ways around the problems of a limited vocabulary)

examples

~tree - cut out the parts of a dense forest scene confirmed not to include trees ~(pavement|road) an unknown object occluding the ground in an urban scene.. fruit&~(apple|banana|orange) 'an item of fruit, but not an apple, banana or an orange. An unusual fruit.' litter&bottle .. specific types of trash found in the road

| vs & could give a more specific label blend, e.g. head/dog -> head&dog "these pixels must give both the head and dog result", versus car|bus|truck|van ... 'these contain either cars, busses, or trucks' (a way of expressing powered, wheeled,vehicular traffic..) 'soil|grass' "this area is patchy grass" trailer&container ...

These might seem too obscure for the casual user; there could be a growing table of translations of such expressions into natural language.. you could also restrict certain verification tasks to people that are logged in with preferences.

They could also guide a refinement search in the label graph, e.g. "fruit&~(apple|banana)" tells it to populate a menu with all the derived fruit types, minus apple and banana. "plant&food" would look for edible plants, items in the label graph connected to both (i.e. fruits, vegetables, salad leaves, nuts)

boolean logic is routinely used in select tools in paint workflows (see #156), e.g. inverting selections, use of hat 'alt' modifier to cut selection, combining layers.. having 'not' labels would give you a way to express building up masks from cuts (drawing the gaps in vegetation , drawing holes for windows & doors from building interior pictures, etc)

If writing a parser seems like a complication (e.g. difficult for users of the data?) we could think about writing a library that generates computed channel data (think about the controlled blurring ideas too) .. or you could think of it as temporary until you find better names for things (like the collective term for wheeled, powered vehicle traffic). We might be lucky and find a few recurring patterns, such as the "it's one of the derived types, but we rule out this set..". We could rule out any operator precedence by saying "you must surround each operator with brackets". or we could even consider just s-expressons, trivial parsing.

Perhaps the parsing of these expressions would be useful for explorer search anyway so the code would be lying around..

I can imagine this enabling the use of more general purpose words, e.g. container&cylinder&metal ('it's a cylindrical hollow object, e.g. oil barrels, gas canisters, aerosol canister, beverage can,..)

dobkeratops commented 6 years ago

EDIT I did a few more labels with another idea, comparative labels which would work at the scene level (you would not mark an area with these) foo>bar would mean "foo and bar both present, but more pixels of foo"

common examples: road>pavement (street images taken from on the road..) and pavement>road (street images taken from on the pavement), building>pavement (images centering on buildings), and so on.

I went further and wrote some such as (pavement|road)>building but bounced a bit with an alterate idea that you'd use '+' to combine areas, e.g. (pavement+road)>building I would also hope the label graph could be used, e.g. ground>building would know to sum all derived ground types, which should include pavement and road.

I'm only adding these experimentally, not counting on them , i.e. I still add plain labels

One question is would this be intuitive? you might assume it means depth, or prominence - and I know a human's mind will automatically prioritise certain things.. so it someone's first guess might not actually be the correct relative area. (one famous example is the positioning of the eyes and hence area above and below the eye line in a face.. people perceive the face as bigger..)

Related prefixes: mostly foo (probably really 50%+ of the pixels, or maybe even just 'the biggest single element), sparse foo (something like under 15%?)

bbernhard commented 6 years ago

like that one!

The | and & operator should already be supported (I just checked again to be sure and realized that for some reason, the & operator doesn't seem to be working anymore..I guess one of the latest updates broke that; I'll look into that). So there is already a (pretty basic) parser implementation that we could extend :)

These might seem too obscure for the casual user; there could be a growing table of translations of such expressions into natural language.. you could also restrict certain verification tasks to people that are logged in with preferences.

cool idea!

foo>bar would mean "foo and bar both present, but more pixels of foo"

common examples: road>pavement (street images taken from on the road..) and pavement>road (street images taken from on the pavement), building>pavement (images centering on buildings), and so on.

I went further and wrote some such as (pavement|road)>building but bounced a bit with an alterate idea that you'd use '+' to combine areas, e.g. (pavement+road)>building I would also hope the label graph could be used, e.g. ground>building would know to sum all derived ground types, which should include pavement and road.

I'm only adding these experimentally, not counting on them , i.e. I still add plain labels

very nice!

I am a huge fan of such expressions...they are really powerful and expressive - and as you can type them, you don't always have to switch between keyboard and mouse.

What do you think about, if we start by writing down all expressions that come to our mind and we want to support. Once we have some sort of 'label definition grammar' we could write our own parser (I guess the label parser doesn't necessarily need to have the same grammar than the parser that's used to query the dataset).

Once we have the parser in place, we could think about how we want to visualize expressions like (pavement+road)>building in the UI. I guess it makes not that much sense to add the plain expression as a whole to the labels list. But if we have a parser, we could parse the expression and display it in a visual appealing way in the labels list (maybe make the more dominant label a bit bigger, or add icons...).

dobkeratops commented 6 years ago

What do you think about, if we start by writing down all expressions that come to our mind and we want to support.

right.. I was hoping that submitting them in the 'add labels' view they'll come through- they do seem to appear in the side - then you've got them in their context.

not quite sure what the best thing to do with the (foo|bar)>baz sv (foo+bar)>baz.. you might even want to be more explicit that you're talking about area, not depth. (area(foo)+area(bar))>area(baz)

bbernhard commented 6 years ago

right.. I was hoping that submitting them in the 'add labels' view they'll come through- they do seem to appear in the side - then you've got them in their context.

aah, nice - that's great. I'll have a look :)

not quite sure what the best thing to do with the (foo|bar)>baz sv (foo+bar)>baz.. you might even want to be more explicit that you're talking about area, not depth. (area(foo)+area(bar))>area(baz)

good question.

I guess it also depends where we want to go with the label expressions. Do we want to use the information just internally (road>building means that road is more prominent than building -> it's better to start with the annotation task road / road is probably a better annotation task for smartphone users...) or do we want to make the information also accessible for dataset users? (i.e the information is queryable)

dobkeratops commented 6 years ago

r.e. label graph #125, and combining,

examples of how this could work..

    passenger_carrier
    aquatic_object
    vehicle
    road_vehicle
    flying_object
    animal
    cargo_carrier
    on_rails

    passenger_carrier&flying_object&vehicle = jet_airliner
    passenger_carrier&aquatic_object= cruise_ship
    flying_object&animal  = bird
    road_vehicle&passenger_carrier = bus
    passenger_carrier&on_rails = passenger_carriage/train
    cargo_carrier&on_rails=..
    powered_object&on_rails=locomotive/train

.. the idea being that combining generic nodes from the label graph could describe objects even before you have a label (e.g. different elements of describing a bird and a bus allow you to describe a jet airliner). Some objects aren't easy to name anyway, but you can say alot about them.

I wonder if a neural net would figure out common visual features for the generic labels, e.g. birds and aircraft have wings/tail-fins; passenger carriers tend to have long rows of windows. carniverous animals tend to have forward facing eyes and canine teeth. tools tend to have handles and functional 'tip', etc.

Perhaps a net could develop intuition to guess the function of objects it hasn't already got a label for.

dobkeratops commented 6 years ago

just thought I'd mention even without boolean expression parsing, adding a simple "include [..] exclude [..]" to the browse/explore search might be a simple option that increases the power of that view for more users. Allows 'show me roads without cars', that sort of thing.

bbernhard commented 6 years ago

@dobkeratops the browse annotation mode should now be capable of dealing with the ~ ("not") and the & operator.

i.e: you can now write queries like: ~tree or person & ~apple.

It's possible, that the more complex queries (with nested brackets) are not working correctly yet - I tried to test as much as possible with unittests; but I am pretty sure I've missed something. ;)

edit: at the moment the new operators are only available in the browse annotation mode; the explore view doesn't support it yet.

dobkeratops commented 6 years ago

suggestion, I see the browse dataset allows looking for road|pavement, van|truck etc .. - and it will show both annotations , this is great!

it might be nice to be able to have a universal browse mode where:- [1] the query includes these possibilities, and the "annotations only" (plus an opposite "unannotated only") [2] on clicking, you can choose what to do : (i) show info (ii) pick a label to annotate, (iii) add labels

this would let you ... (i) look for missing labels, then add them (e.g. "look for road&~sky" , then click add label if you see any sky..).. unusual objects stand out when browsing.

(ii) complete useful combinations of annotations, e.g. when trying to get as many street images with both road and pavement annotated, or both grass & foliage (to tell the difference between ground & vertical vegetation)

Perhaps a universal browse mode in the UI would be enough instead of needing seperate 'browse & annotate', 'explore/browse..'. You could then keep the existing annotate/label modes.

It would just mean one extra (compared to browse&annotate) , but I think I'd be ok with it: I've been using that for polygonal tasks where you're going to click a complex outline anyway.

I suppose you could put an 'action' dropbox in the top, generalizing the 'browse/export' selector to acheive this, or have a popup menu when you click (annotate:label list + 'show info..' + 'add label..' ?)

dobkeratops commented 6 years ago

example: browsed for road|pavement, now we find many nice examples where both road & pavement are annotated, but when scrolling through it would be nice to dip in and annotate the missing part (eg pavement in the top left image, etc)

screen shot 2018-07-21 at 10 34 39
dobkeratops commented 6 years ago

update on the above..

I see by browsing for pavement|road in annotate?mode=browse, this almost works already: you can indeed click 'annotations only' , but when you actually click to annotate, it seems to ask to re-annotate whichever one was already done (regardless of the order, road|pavement or pavement|road). Perhaps it could look for the first unannotated label in the search in the clicked image, then this mode could acheive the 'completing pairs of annotation' workflow, without needing any new UI

bbernhard commented 6 years ago

it might be nice to be able to have a universal browse mode where:- [1] the query includes these possibilities, and the "annotations only" (plus an opposite "unannotated only") [2] on clicking, you can choose what to do : (i) show info (ii) pick a label to annotate, (iii) add labels

this would let you ... (i) look for missing labels, then add them (e.g. "look for road&~sky" , then click add label if you see any sky..).. unusual objects stand out when browsing.

(ii) complete useful combinations of annotations, e.g. when trying to get as many street images with both road and pavement annotated, or both grass & foliage (to tell the difference between ground & vertical vegetation)

very cool idea! Thanks for bringing that up - that's definitely something we should implement. :) (I guess I'll create a separate ticket in order to make sure that the ideas won't get lost down here)

example: browsed for road|pavement, now we find many nice examples where both road & pavement are annotated, but when scrolling through it would be nice to dip in and annotate the missing part (eg pavement in the top left image, etc)

I am not sure if I understand you correctly. At the moment it should be like that: If you type road|pavement you will get a list of all annotation tasks where the corresponding image has the label road or pavement (or both). If an image has the labels road and pavement and both annotation tasks are still available it would show the image in the listview twice - once for the label road and once for the label pavement. The "problem" with that is, that you can't specify at which annotation task you are after. e.q: I want to work on all annotation tasks where road&car (i.e the image has the labels road and car) but only work on the car annotation tasks. With the current query syntax that's not possible...we would need a specific operator to "mark" the annotation task we are after. e.q: road&*car -> "all annotation tasks where the image has the label car and road, but show me only car annotation tasks".

Is that what you are after?

The "annotation only" (still not happy with the name choice) option works exactly the same way, except that it shows already existing annotation tasks. The idea is, that you can visually scroll through all the existing annotations to find wrong ones. e.q: if I type apple it shows all the annotations for apple. If I spot now that an orange was accidentally annotated as an apple, I can click on the corresponding entry, delete the "orange annotation" and draw a bounding box around the apple.

dobkeratops commented 6 years ago

If an image has the labels road and pavement and both annotation tasks are still available it would show the image in the listview twice

ahhh.. I didn't realise that. so basically I was seeing it sort the tasks hence it looked like it was always asking for the same annotation. ok that makes sense.

.we would need a specific operator to "mark" the annotation task we are after. e.q: road&*car

so what I had in mind as a UI-free tweek would be that it selects an annotation task based on whats available. in the case of having ticked "annotated only" and you searched for 2, some images would have 1 task that makes sense. I guess it might be a bit random if you looked for 3, and it just picked one of the other 2 at random.

marking it would be interesting but not ideal: because you'd want to pick the task per image.. whichever of the labels wasn't done yet.

The idea is, that you can visually scroll through all the existing annotations to find wrong ones.

Ah ok that explains what i'm seeing.. it's a corrective task. Seems like a different mode with choice might be better then, and keep the existing workflow as intended

dobkeratops commented 6 years ago

is there a way to change the label: I see you can delete an erroneous object but the error I often have is a valid area ,just with the wrong label; to correct it with the current tools you'd need to delete it and draw it again ;

I suppose this could be part of a future label-refinement workflow

e.g. here i looked for pavement, annotations only ,I found this error- this area is supposed to be road.

screen shot 2018-07-22 at 02 26 16
dobkeratops commented 6 years ago

Given this mode isn't actually in the main UI, it isn't a big problem I guess, but

I'm finding it very error prone: the problem is, I usually instinctively click and scroll the work area to do the task - the instruction of what label you have sometimes isn't visible when you're actually working (see screenshot below) Even before this capability, I made a similar mistake.. settling into a rhythm of annotating roads, then switching to pavement scrolling down and and annotating road.. similar issue: the name of the label you're about to annotate is scrolled away.

it is visible initially but.. for these road/pavement tasks you're usually doing something low down, so you end up automatically just scrolling the window down instinctively, without bothering to read:-

screen shot 2018-07-22 at 03 22 02

actual working state..

screen shot 2018-07-22 at 03 20 28

Perhaps just repeating the label name at the bottom of the screen would be a temporary fix? (e.g. imagine the bottom strip reading road - [skip] [done])... or what about moving the completion button up to the label name ("annotate road: [done]")*

another way might be control over how the page scrolls (I forget the web jargon but i think you can make a strip of the screen anchored to the window frame, whilst another portion scrolls). I know there's the pan command, but scrolling the window is instinctive and fast (e.g. mousewheel, trackpad gestures)

the ability to search with an expression is awesome, but it could be better in conjunction with a deliberate choice of action after you click an image.. so imagine just using it for label/image search, decoupled from any notion of tasks. Imagine a unified mode, starting with "browse" (with filter options for any/ annotated/un-annotated), with the ability to see existing annotations in context (how they fit together), then you choose an action after you click (choosing a label to annotate, add missing labels, or even invalidating?). With the goal of annotating both road&pavement, you could just do both in sequence, whilst you're focussed on that one image,given the choice;

.. but this problem i'm encountering can maybe guide you toward avoiding this hazard with the main presented modes.

(* something else .. in the browse mode, perhaps there's less need for 'un-annotateable/blacklist' - because you deliberately selected it. in a universal mode, you could give an invalidate option)

dobkeratops commented 6 years ago

regarding browsing and add labels - the expression based search would be very useful , e.g. i know many images miss the "sky" label because it was added relatively recently, so i could search for labels in outdoor scenes (i.e. things that would co-occur with sky) (road|pavement|car|person)&~sky ... then scan through the images looking for ones which do actually have sky, and add the label there.

the other thing that happens is you sometimes see unusual objects which dont appear very often in the random 'add labels' mode, again by virtue of co-occurence (excavators/loaders/forklifts in urban scenes..). you kind of want to annotate these rare ones straight away. with a universal mode that would be possible (in the existing add labels, it doesn't seem to register it as a doable task until you see it in another pass..)

bbernhard commented 6 years ago

I'm finding it very error prone: the problem is, I usually instinctively click and scroll the work area to do the task - the instruction of what label you have sometimes isn't visible when you're actually working (see screenshot below)

very good point - haven't thought about that.

Do you think it would solve the problem if we make the label more prominent or do you think that additional measures are needed? I guess we could show a popup dialog when one queries for more than one label. In the popup dialog you could select the label you want to annotate. That way it's guaranteed that you only get the desired annotation tasks during one sitting.

But that has the disadvantage that you can't use queries like ~tree anymore. I guess that there are people out there, that want to annotate everything, except xxx (e.q: tree).

I think the main problem is the habitual behavior that kicks in, after you work on the same label for some time. If you then change the label to something else, the old label is still on your mind...But not sure how we can tackle this problem...I guess that automatically happens if you work on the same label for a long time? I think it should definitely get better, if the label to annotate is always shown. I am wondering if there are some UI concepts that could help here? Something that "resets your brain".

the ability to search with an expression is awesome, but it could be better in conjunction with a deliberate choice of action after you click an image.. so imagine just using it for label/image search, decoupled from any notion of tasks. Imagine a unified mode, starting with "browse" (with filter options for any/ annotated/un-annotated), with the ability to see existing annotations in context (how they fit together), then you choose an action after you click (choosing a label to annotate, add missing labels, or even invalidating?). With the goal of annotating both road&pavement, you could just do both in sequence, whilst you're focussed on that one image,given the choice;

great idea!

regarding browsing and add labels - the expression based search would be very useful , e.g. i know many images miss the "sky" label because it was added relatively recently, so i could search for labels in outdoor scenes (i.e. things that would co-occur with sky) (road|pavement|car|person)&~sky ... then scan through the images looking for ones which do actually have sky, and add the label there.

I guess we could incrementally work towards a "unified browse mode" and add the labeling part next. I think that shouldn't be that much work. If it turns out that a separate browse based label view and a browse based annotation view is cumbersome, we can then merge those modes together. :)

dobkeratops commented 6 years ago

Do you think it would solve the problem if we make the label more prominent or do you think that additional measures are needed?

the simplest fix I can think of is to put the 'done' button next to the label name. It's prominent enough, it's just the problem of scrolling it away

I guess that might look unusual , since a completion button is usually at the bottom, but perhaps there are other ways to balance screen like moving the toolbar somewhere else aswell,

habitual...But not sure how we can tackle this problem

right.. however now I realise in both cases, the label was offscreen- I hope fixing that will help.. eg if you have to scroll back up to find the "done" button, you're much more likely to read the label there

f it turns out that a separate browse based label view and a browse based annotation view is cumbersome, we can then merge those modes together. :)

sure.. I wouldn't say it's cumbersome - it's working quite well; it's just I can imagine new workflows with a completely flexible mode (I have these tangential 'itches' as I browse). I'd hope with the completely flexible mode as the exact opposite of the task-queue, you'll have both extremes covered .

I think the browser view will be very good for validation, because your eye is quickly drawn to whats different. (e.g. look for sky, then scroll through lots of images with sky ... and the ones without sky would jump out at you for invalidation)