ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

simultaneous annotate-and-label (LabelMe style) #213

Open dobkeratops opened 5 years ago

dobkeratops commented 5 years ago

IMO this would be the biggest enhancement to the tool, but I'm guessing it will take a lot of thought to retrofit well.

i.e. the ability to have one unified mode where you look at the image, then draw around what jumps out most , and say what it is immiediately. As the label list grows (especially with refinements), with the current approach you need to go back over the images multiple times , finding the objects with your eyes twice (if there are 5 labels, your eyes will need to scan the image 5 times as seperate tasks)

It's definitely good that you can add labels without annotating - because you can still use it for searching , and you can train on the whole image - but from that list the system can't know which are the most important or convenient ones to annotate. you could have a big car in the middle and a tiny person in the distance, and it will ask you to annotate the person first..

Q1 how could this be retrofitted in a way that is compatible with the existing task-oriented system. I think you have a versioning concept now.. perhaps it be possible for each label switch to submit a new version .. i.e. lets say you see an image, you annotate a car, person - you basically get a new version for the Car and Person tasks for that image.

Q2 what about UI - perhaps you could turn the label indicator "Annotate: person" into a label-entry control.. you could start with a presented task, but change it on the fly, between annotations. This would avoid the need for another mode. (I used 'enter' as a hotkey to enter a new label name in my experiment)

Alternatively, LabelMe works by letting you just draw ,then, a dialogue appears to say what it was. This contains controls for options (occluded, plural could be specified too). Perhaps such a dialogue could go through the controlled label list. Those controls are fiddly ('type the label name then use the mouse to say occluded'), but that could be fixed with hotkeys.

Q3 related to #212 - could the refinements be built into the label. This would be especially advantageous.. finding things once. you see 3 cars.. a hatchback, an SUV, a coupe .. you could just label as such in one step for each without having to go back and refine.

What would be the best way to do this.. .. a load of aliases? - this might catch the natural ways of writing these. the hazard is you might need a lot of aliases. plastic bottle sports water bottle water bottle etc ... or a second control ? you set a main label, then see refinement to one side ("Annotate: [car] - [hatchback]")

Q4: Could the validation that all examples are annotated be encoded as a separate peice of information - perhaps if you change label mid-flow, you could assume by default all might not be annotated. Conversely, if the labels are more accurate, (different car types, 'bottle' -> wine bottle, plastic water bottle, beer bottle etc)- could you worry less about completeness because it's much less likely that precise descriptions would be repeated. Perhaps you could demand precise descriptions when switching label this way

I would argue that incomplete annotations are still useful: because you could crop each out to make a unique training sample (I'll make an illustrative mockup). you could use annotations for everything else as negatives (eg [dog, car,bottle] annotations.. are all suitable negatives for aperson``` detector, etc) This could also be used for a batched "correctness" validation ,e.g. a page that shows "all the [beer bottles]" - tag anything that is NOT a beer bottle

Q5 would it be possible to handle components seamlessly this way - again when you see a person , it depends how big it is - it might be easier to do the components first, it might not.. in crowd scenes with distance people its easier to just draw a box around the whole person, but with a big person the foreground its easier to draw boxes around the head, hands,feet

bbernhard commented 5 years ago

IMO this would be the biggest enhancement to the tool

totally agreed - a unified mode would definitely be a great extension to the existing task based approach. Unfortunately, this requires some pretty big changes to the existing code base and probably also raises some new problems. e.q: at the moment, there is no locking in place. So whenever someone requests a new annotation task, it might be possible, that another user is also working on the exact same annotation task. If that happens, the last user "wins" (i.e the annotation that will be submitted last, will become the new annotation, the other one will be the old revision), no matter which annotation was better (the last submitted annotation could be a rect annotation that overrides a more accurate poly line).

As there are a lot of annotation tasks and each annotation task usually requires not that much time (~1-2min at max), it's unlikely that there will be a lot of clashes. But with the unified mode, I think that could change. When one requests an image in the unified mode, usually more work is done in one step (adding labels, annotating objects/components, ..) which makes clashes more likely. I think in a first iteration of the unified mode we probably won't need any locking, but I guess it might make sense to add that later on. (maybe we can use websockets for that; as soon someone requests an image, we open a new socket and lock the image. in regular time intervals the client now sends a "i am still alive" ping to the server. when the server doesn't receive such a ping within a few minutes, we can assume that the user closed the browser tab and unlock the image again).

But I think in general, the unified mode should be do-able. There are no red flags in the code base, that prevent such a thing.

Your suggestions were really valueable to me - thanks a lot! I got quite a few ideas for future improvements. So the annotation mode will definitely see more improvements in the near future!

Q1 how could this be retrofitted in a way that is compatible with the existing task-oriented system. I think you have a versioning concept now.. perhaps it be possible for each label switch to submit a new version .. i.e. lets say you see an image, you annotate a car, person - you basically get a new version for the Car and Person tasks for that image.

you are right, I think that should work :)

Q2 what about UI - perhaps you could turn the label indicator "Annotate: person" into a label-entry control.. you could start with a presented task, but change it on the fly, between annotations. This would avoid the need for another mode. (I used 'enter' as a hotkey to enter a new label name in my experiment)

Alternatively, LabelMe works by letting you just draw ,then, a dialogue appears to say what it was. This contains controls for options (occluded, plural could be specified too). Perhaps such a dialogue could go through the controlled label list. Those controls are fiddly ('type the label name then use the mouse to say occluded'), but that could be fixed with hotkeys.

Sounds good to me. However, I am not sure whether the "simple" and "unified" mode can share the same Ui forever. I think at some point we probably have to split them up into two separate UIs/modes. Otherwise the pro features might clutter the simple mode. But I guess we could start with your suggestions and see how that goes. :)

Another alternative would be to use an existing, feature rich (file based) annotation tool and extend that to also support loading/persisting data from ImageMonkey. e.q: https://github.com/omenyayl/dataset-annotator, https://github.com/wkentaro/labelme. Personally, I haven't tried any of these tools yet, so I can't tell how good they are. It will probably take some time to do accomplish that (I assume that most of the tools weren't designed with that in mind), but I think it could make sense if we find a tool that's really powerful

Q3 related to #212 - could the refinements be built into the label. This would be especially advantageous.. finding things once. you see 3 cars.. a hatchback, an SUV, a coupe .. you could just label as such in one step for each without having to go back and refine.

That's a great idea! I'll have to think a bit more about that, but I think that should be possible.

Q4: Could the validation that all examples are annotated be encoded as a separate peice of information - perhaps if you change label mid-flow, you could assume by default all might not be annotated. Conversely, if the labels are more accurate, (different car types, 'bottle' -> wine bottle, plastic water bottle, beer bottle etc)- could you worry less about completeness because it's much less likely that precise descriptions would be repeated. Perhaps you could demand precise descriptions when switching label this way

I would argue that incomplete annotations are still useful: because you could crop each out to make a unique training sample (I'll make an illustrative mockup). you could use annotations for everything else as negatives (eg [dog, car,bottle] annotations.. are all suitable negatives for a person``` detector, etc) This could also be used for a batched "correctness" validation ,e.g. a page that shows "all the [beer bottles]" - tag anything that is NOT a beer bottle

Q5 would it be possible to handle components seamlessly this way - again when you see a person , it depends how big it is - it might be easier to do the components first, it might not.. in crowd scenes with distance people its easier to just draw a box around the whole person, but with a big person the foreground its easier to draw boxes around the head, hands,feet

That are some really interesting ideas, thanks for sharing!. Have to think a bit more about that :)

dobkeratops commented 5 years ago

As there are a lot of annotation tasks and each annotation task usually requires not that much time (~1-2min at max), it's unlikely that there will be a lot of clashes. But with the unified mode, I think that could change.

.. right definitely.. the user would linger on an image for longer in a unified mode. locking would be required, although at present with 40,000 images but a few simultaneous users clashes wont happen. I see that you need to consider it if you want to grow the active user base

Sounds good to me. However, I am not sure whether the "simple" and "unified" mode can share the same Ui forever. I think at some point we probably have to split them up into two separate UIs/modes. Otherwise the pro features might clutter the simple mode.

I'll read what you wrote about ideas for unifying Refinements with annotation. seems like there might be overlap there

dobkeratops commented 5 years ago

Another alternative would be to use an existing, feature rich (file based) annotation tool and extend that to also support loading/persisting data from ImageMonkey. e.q:

"image monkey's" UI comes across as more modern than labelme, and you've got the search modes now.. I think that's worth keeping and extending. I can imagine the search page handling validation etc.

The best thing about LabelMe is the tree-view for the annotations, allowing you to express a general purpose nested heirarchy of annotations.. but it does make it more technical to use. your solution of saying what an component is of is a reasonable way to handle parts without advanced controls that take time for casual users to master

bbernhard commented 5 years ago

Currently thinking a bit more about the unified mode. While I think I've figured out most of things (at least in my head ;)), there is still one thing that's a bit of a blocker at the moment: What about the trending labels that are not yet productive?

At the moment, you can only work on those annotation tasks, where the label is already made productive (i.e label is defined in the labels.json). While this works pretty well for a task based approach, I think it's not working for the unified mode.

I mean, we could restrict the unified mode to productive-only labels, but I think that makes the unified mode pretty much useless. If you've chosen the unified mode, you did that because you want to have the freedom to freely label + annotate the image and not be restricted by the system.

But giving up that protection also sounds wrong to me. If we allow unrestricted labeling, I think we might run into the same problems as the LabelMe dataset...i.e misspelled labels, garbage labels, ...

dobkeratops commented 5 years ago

Right it certainly takes some thought... I'm not sure what's best. but I think it would still be useful in it's simplest, safest mode. This might be ok with label switching (slightly different to label me) in that you could check if the label you want exists before actually drawing the boxes/outlines. conversely if you work exactly like label me (draw something then get a dialog to say what it is..) it would indeed be an unwelcome surprise to discover you can't describe it yet.

option (i) - only allow productive labels. (+) safe, solid, fits with existing workflow (-) restrictive. workaround: just keep expanding the label list :)

option (ii) - free text entry like LabelMe (+)complete versatility (-) will get many mis-spelling/changed wording duplicates, plus spam, abuse. Possible workarounds: hide these annotations to other users until they've been through a moderation process (we could use GitHub/PR's to just work on text files mapping a load of aliases). possible reduction of mis-spellings etc: show suggestions (but dont enforce auto-completion)

option (iii) - perhaps restrict to productive labels, but give some fields for qualifiers - e.g. what if you could add any properties to the object - and those were set in a formal dialog. ("material=[wood,glass,any metal,plastic...]", "posture/action=[sitting,standing,walking,running,..]", "occupation=[police,medical,..]", "body type=[]",...) .. you could demand or at least specifically encourage qualification (to increase the chance this annotation is unique in the image, and get as much value as possible).
In the case the annotation doesn't have a label yet, perhaps leave the object with a "?" as the main label - but still give the opportunity to have specified other properties?

bbernhard commented 5 years ago

if you work exactly like label me (draw something then get a dialog to say what it is..) it would indeed be an unwelcome surprise to discover you can't describe it yet.

totally agreed, that's more annoying than helpful.

option (i) - only allow productive labels. (+) safe, solid, fits with existing workflow (-) restrictive. workaround: just keep expanding the label list :)

yeah, right that would be the safest bet. But my gut feeling is, that it gets annoying pretty quickly. I think people wanna work on the stuff they like and not on the stuff the system dictates them.

option (ii) - free text entry like LabelMe (+)complete versatility (-) will get many mis-spelling/changed wording duplicates, plus spam, abuse. Possible workarounds: hide these annotations to other users until they've been through a moderation process (we could use GitHub/PR's to just work on text files mapping a load of aliases). possible reduction of mis-spellings etc: show suggestions (but dont enforce auto-completion)

that's a really interesting approach, like that very much. Especially the part with the mapping/aliases.

Just a thought experiment, but lets assume for now, that we have agreed that we don't wanna have productive labels that include the number of occurences. So instead of one dog we would use the base label dog and represent the number of occurences with a property.

Currently, if someone adds a the label one dog we can decide pretty freely what we wanna do with it. We can either make it productive with the original proposal (one dog) or we can rename it to dog (which would result in a loss of information, but keeps the naming schema consistent). As no annotations exist which refer to the one dog label, we don't have to be afraid that we change the label in a way that the existing annotation will become wrong.

Let's assume that we have a unified mode, where users can add labels + annotations in one go. So, again, the user adds the label one dog and draws a bounding rect around the dog. Now, as the submitted changes don't fulfill our naming schema, we have to change that. What we could do now, is we could transition the more specific label one dog to dog and add the property occurence=one to the concrete bounding box. I guess that should work in that case, no? But would it work for all the other cases, or are there some cases where this won't work? (I am thinking about the case where we rename something, and the underlying bounding box then will represent something completely different)

If we can always rename labels without changing the meaning of the actual bounding box , I think we could really use aliases + mappings. In that case we would just need some sort of pseudo language where we can specify what gets translated to what.

e.q: "If a user enters the label 'red car', strip the color from the name and attach it as color property to the bounding box".

red car => car [color=red]

dobkeratops commented 5 years ago

we could transition the more specific label one dog to dog and add the property occurence=one to the concrete bounding box. I guess that should work in that case, no? But would it work for all the other cases, or are there some cases where this won't work?

I think this could work. the question is how big will this table of aliases get. I wonder how many could work with automatic parsing.. natural language can have corner cases (remember "no entry sign" haha) labelMe might have gone through an evolutionary process here.. i noticed they had "occludedCar" etc, but there's an occlusion flag in the label entry dialog - perhaps originally that didn't exist, and those are legacy.. or alternatively , maybe people miss the official dialogs (so you might end up needing the aliases anyway..)

bbernhard commented 5 years ago

I think this could work. the question is how big will this table of aliases get. I wonder how many could work with automatic parsing.. natural language can have corner cases (remember "no entry sign" haha)

hehe, true :D

I think jsonnet (https://jsonnet.org/) might also be helpful here, as it allows us to define variables and then later reference those variables. That way we could define properties like colors once (basically a array of color names) and then reference them whenever we need to write a color-alias.

labelMe might have gone through an evolutionary process here.. i noticed they had "occludedCar" etc, but there's an occlusion flag in the label entry dialog - perhaps originally that didn't exist, and those are legacy.. or alternatively , maybe people miss the official dialogs (so you might end up needing the aliases anyway..)

yeah, right. I think typing (especially with auto-completion) can sometimes be faster than navigating through menus + dialogs with your mouse. So, maybe we can use the aliases concept not only for label correction, but also as an alternative way to define label properties.

Speaking about dialogs and menus: How should the unified mode look like? Is it sufficient, to just use the existing annotation view or do we need other controls/inputs/dialogs? Currently, the image to annotate takes up almost all the space. Should we reserve some space on the left or right for a side menu/tree view..etc? How should the ideal workspace look like?

dobkeratops commented 5 years ago

How should the unified mode look like? Is it sufficient, to just use the existing annotation view or do we need other controls/inputs/dialogs?

one option is (if you want the user to enter the label first) to change the label indicator ("annotate all: <.....>") into a combo box ? (text entry for the label, with a drop-down for accessing the label list/autocomplete suggestions once you start typing)

if working the other way, i guess you can again use a similar layout.. change that instruction "annotate: anything.." and use a popup (with further options for properties?)

Should we reserve some space on the left or right for a side menu/tree view..etc? How should the ideal workspace look like?

so perhaps you're thinking of adding a label browser, that's an interesting idea. (originally when i read that i thought back to label-me which shows all the current annotations in a tree view , showing their hierarchical layout.. but thats more UI work)

all depends which path is convenient to retrofit. any of these options would be useful

bbernhard commented 5 years ago

I think we don't necessarily need to integrate the unified mode into the existing annotation view, in case the two modes do not have much in common. Sure, a completely new view would probably much more work, but I think it has the potential to design the new mode in a way that works best for power annotators.

I think the hardest task will probably be to structure the view in a way that one doesn't get lost. The task based approach has the advantage that one is always working on one label at a time. So, most of the time there are not more than 5 bounding boxes on the screen (not counting the detailed urban scenes, where you have a lot of small objects per image). With the unified mode, the number of bounding boxes will be much more. I think that needs a few additional functionalities like:

bbernhard commented 5 years ago

Here's a first mockup:

https://wireframe.cc/Ao0aEK

The view is split up into three areas:

The whole concept is inspired by Photoshop's layer system. So, every label in the labels pane represents a new annotation layer. If you want to annotate apple, click on the label apple and mark all apples. If you want to annotate all banana, click on the banana label and again mark all the objects. If you want to add refinements to an existing bounding box, just select the bounding box and click on the "Add" button in the properties pane.

When you are done, click the "Done" button. The annotations from all layers are then merged together and submitted to the server.

unified_mode_mockup

It's just a first draft, so please let me know if this idea is totally stupid. As long as we are still in the mocking phase, we can easily change things :D

bbernhard commented 5 years ago

still work in progress:

unified_mode_demo

dobkeratops commented 5 years ago

right that makes sense. so you can see a label list.. i guess you could have a toggle to show all annotations aswell.