Open dobkeratops opened 6 years ago
First of all, many thanks for the summary - VERY much appreciated!
I already started working on the first items of the list; I'll try to get all/most of them done in the next few weeks (starting with the easy wins).
in 'label-me' I was able to do hundreds of annotations in a sitting.. wheras this rigid workflow burns me out after about just 10.
okay, that's really bad - and should definitely change :)
'explore-dataset' driven workflow: the user chooses a label, then scrolls through the images to pick one thats easy to annotate. just change the existing UI to spawn 'annotate' from when you click on an image in explorer view <<<<<
just thinking whether a annotate button in the explore view would really solve the problem? you still jump between the explore view and the annotation mode back and forth, which could be a flow killer?
a slightly adapted workflow: would it maybe be better to integrate a two step process. i.e you first scroll through a list of pictures with labels and select all the annotation tasks you want to do. after that press the "and now annotate" button which brings you to the annotation screen where you get one annotation task after the other served. that way you won't need to jump between those two screens all the time.
In a first draft the selected tasks could be stored in the local browser session, but we could later extend that to also save that on the server side (basically bound to the user's account).
Would that be better in terms of "staying in the flow"?
Something else: the CPU load of text entry can sometimes skyrocket: I'm guessing this is the autocomplete - causing a laptop to overheat and further slow down, e.g. individual keypresses can take a second to register.. if this is the autocomplete feature, maybe a delay on activating it would help- or letting the user type unobstructed, then say "did you mean..." on clicking submission
Many thanks for the info, i'll have a look if I can reproduce that on my PC. I guess if I increase the number of items in the list drammatically, I definitely should see a significant change in the CPU load. btw: which browser are you using? do you know if that only happens over time or also after the view gets loaded for the first time?
How about a completely open ended "draw boxes around the interesting objects" mode - leave it to the 'add labels' view to tell it what they are.
If I got you right, then it's basically a reversed mode. You first draw bounding boxes around objects and then the user has to label/describe that specific bounding box? I have to admit, that's a pretty interesting idea! However I think it could be pretty hard to detect/avoid duplicate work. I imagine that it could be really difficult to merge the bounding boxes that were created using the task based approach with the ones that were created using the "free drawing" approach. I guess no matter how hard we try, we will end up with duplicates :/
okay, that's really bad - and should definitely change :)
(my workaround is that I currently spend most of my time on this site in 'add labels' .. assuming that the annotation part will improve in future)
just thinking whether a annotate button in the explore view would really solve the problem? you still jump between the explore view and the annotation mode back and forth, which could be a flow killer?
I speculate it would still help a lot - for minimal UI work: maybe you could repurpose the explorer view entirely for this purpose i.e. the click on the image goes directly to annotation without even needing another button , and worry about inspecting the data later.. (maybe use a right click menu for that..it's probably a developper wanting that so they're more likely to be able to ).
The flow killer is having to click through suggested tasks until you find one thats reasonable to do; the explorer view would let you scroll through (which is fast) to find something doable .. I think this might be engaging in a similar to cropping video (scroll & annotate).
I wonder if the 'back button' would automatically give you a means of returning to the explorer (although i guess it really wants to remember the label to show).
I suppose you could consider a button in the annotation view to launch the explorer for the current label (.. would that fit in the existing plan for a mode switch button?)
do you know if that only happens over time or also after the view gets loaded for the first time?
it's definitely intermittent. It might be something which is hovering on the edge and if the laptop overheats then it gets worse. It can work ok sometimes.
f I got you right, then it's basically a reversed mode.
Exactly. Then you'd have a something complimentary. Peopel can use whichever mode suits them. Again this would avoid the need to click through images.. once you've seen one, you can express more about it.. make use of the bandwidth and user's time.
However I think it could be pretty hard to detect/avoid duplicate work.
perhaps it could show all the existing bounding boxes .. maybe in a lighter shade , maybe even a slightly different color for clarity; - you'd only re-draw if you thought you could improve on the existing annotation.
I guess no matter how hard we try, we will end up with duplicates :/
What I imagine is a large growing pool of images, hence a lot of pending work .. I think duplicates would be a minor issue.. you could just blend them together, creating fuzzy assignment if they disagree. Personally I think fuzzy labelling would be good anyway.. thats what the difficult objects (trees) probably need. I almost think the polys and rectangles need to have an assumed error anyway (e.g. consider the number of vertices/edge length and estimate how fuzzy it should be)
I speculate it would still help a lot - for minimal UI work: maybe you could repurpose the explorer view entirely for this purpose i.e. the click on the image goes directly to annotation without even needing another button , and worry about inspecting the data later.. (maybe use a right click menu for that..it's probably a developper wanting that so they're more likely to be able to ).
sounds good to me. Using the explore view as base definitely makes it easier in terms of code reuse. :) I think I'll start with that one next, as this might have the biggest impact.
perhaps it could show all the existing bounding boxes .. maybe in a lighter shade , maybe even a slightly different color for clarity; - you'd only re-draw if you thought you could improve on the existing annotation.
good idea - you are right, that would definitely help :)
Exactly. Then you'd have a something complimentary. Peopel can use whichever mode suits them. Again this would avoid the need to click through images.. once you've seen one, you can express more about it.. make use of the bandwidth and user's time.
totally agreed!
What I imagine is a large growing pool of images, hence a lot of pending work .. I think duplicates would be a minor issue..
you are right, if the pool is large enough the chances of duplicate work shouldn't be that high.
it's definitely intermittent. It might be something which is hovering on the edge and if the laptop overheats then it gets worse. It can work ok sometimes.
A minor change that I'll going to commit in a minute is, that the autocomplete dropdown will only show after at least 2 characters written. I don't think that it will solve the actual problem, but maybe it makes it appear less often. Still need to investigate what's causing this.
ok, I think I have a working PoC:
I decided to keep the explore view as it is and added the functionality to the annotate view. Adding the functionality to the explore view, would have meant, that a new browser tab would have opened after clicking on an image. While this also works, I found it really distracting and after a while you have a lot of open tabs in your browser. I think this is slightly better and should also perform faster (as we don't have to reload all the .css and .js files.
In principal it works like this:
apple
, apple | orange
, ...) I think I should be able to finish that within the next few days and push it to production. (Once it is pushed to production, it will probably available under the following url: https://http://imagemonkey.io/annotate?mode=browse)
ok this seems interesting , I look forward to trying it out. I noticed the wording changes that are live, thats also nice
a randomly ordered list of available annotation tasks gets returned/ locking
I hope this will be ok.. I imagine it would be
if you annotate the image, mark it as non-annotatable or blacklist the image
I kind of think the blacklist/not-doable are temporary fixes which could eventually be removed.. there is certainly the possibility people entered incorrect labels (-> hence label validation) .. but once you've got the ability to search for labels - you don't need to remember a user label preference; also I hope the not-doable cases will gradually be fixed (with the other tool improvements)
but what if eventually the whole 'discrete task' idea could also fade away;
then you're basically always just looking through for incorrect annotations (paint over them with correction) , or un-annotated pixels (mark it and add the label) .. the ultimate tool being if you can just do all that switching in one place (imagine thinking of it like a painting tool, with the 'palette' being labels, the label explorer being like a palette mixer ..)
As your label list is growing (it's a lot better now), maybe this will become more like LabelMe where you can consider pixel-coverage as the mark of completeness (.. and bear in mind any object can always be refined with it's parts.. as the parts list also grows, the depth of the annotation can also increase)
i was also going to suggest a slightly different take on the rectangle tool:-
so you started out with rotatable boxes (hence the selection idea to adjust them)
.. but then you added the polygon tool .. I think drawing a polygon is actually easier than a rotated bounding box - because to rotate a bounding box you must estimate the size, rotate it, then correct - it's arguably more drags/clicks than drawing a quad around it
so could you remove the bounding box adjustments altogether (just direct people straight to polys if they want to be more precise) , and introduce a 'negative annotation' like a downvote on something that's incorrect, or even a refinement of something that's correct .. e.g. someone might blast bounding boxes .. but then the next user might come along and trim them.. think of it like accumulating brush-strokes, with a fuzzy-probabilistic label assignment - community consensus.
with that mindset , you wouldn't fear 'repeat' work - it would almost be a way of validating
having said all that, do you have an idea how many people you expect to use this on each device (PC mouse+keys, laptop, tablet, phone) .. my perspective has come from laptop use (where I have a trackpad - hence the focus on bounding boxes rather than polys, but wanting to leverage keyboard input)
I kind of think the blacklist/not-doable are temporary fixes which could eventually be removed.
yeah, I also hope so :)
but what if eventually the whole 'discrete task' idea could also fade away;
The main idea for using a task based approach was, that it's quite easy to integrate that in my/people's lifes. At the moment I am pretty busy with developing and unlocking pictures, so I can't find much time to contribute to the dataset (I am really glad that you are contributing so much; otherwise the activity chart would probably look pretty bad by now ;)). But I always find time to do the "little things".
I have the ImageMonkey chrome extension on my laptop configured to serve me a random validation every time I open a new browser tab (with a 30min pause between each validation). It's not much, but on good days I can make 10 validations without interrupting my workflow. The whole task based approach was created with the idea in mind, to get more and more of those "easy wins" applications.
Slack plugin: a lot of people use slack for their daily communication. If there would be a slack bot, that serves you validations or simple annotation tasks at a decent frequency (e.q 3-5 times a day). That way you can do something good without getting distracted too much.
mobile app: imagine an alarm clock that only goes off, after you have done a validation/annotation.
captcha use case: use annotation/validation tasks to identify humans.
tamagotchi like game: your character grows and levels up when you teach him stuff (that's an apple, that's a banana...)
Another advantage of the task based approach is, that it's rather easy to validate annotation tasks (as they are isolated and complete in itself). So as soon as someone marks all occurences of an object in the image we can verify the work. As the annotations won't change anymore, we can (to a certain degree) be sure that a annotation task is valid, once it reaches a significant number of positive votes.
With a completely free annotation mode, I guess it's a bit harder to reach that state - people can always add or remove stuff. People with malicious behaviors can destroy work of others by removing bounding boxes, ... Of course, there would also be the possibility to create a visual diff of the changes (similar to git diff) that needs to be validated, but I guess that's quite hard to get right and maybe also creates a lot of noise (imagine that someone creates 10 bounding boxes...one after each other; every time he saves the changes. -> we would have 10 visual diffs).
From the dataset users perspective, I guess it's also nice to know, if all occurences of an object in an image are annotated or not. With a completely free annotation mode I guess it's hard to get that information. If the dataset gets bigger and bigger, its probably not possible anymore to manually verify if the data I've downloaded is actually correct. e.q: I could imagine that someone could be interested to know if all dogs in the picture are correctly annotated.
With the labelme dataset e.q I often had the problem, that I had no clue about the dataset's quality. I totally understand, that there is always the possibility to add more details, but from a user's perspective, I am interested, if the labels that I am interested in are "done".
Having said that all, I definitely see the benefits of a free, incremental approach. My hope is, that we somehow can combine the task based approach with the incremental one to get the best of both worlds. One of my biggest "fears" is, that at some point we lose control over the data and can't say anymore whether our data is valid or not. I think it would be pretty bad if we accumulated a lot of data over the years, but we can't trust the data anymore. At that point the whole data probably gets useless.
I am glad you started this dicussion..talking about this always helps me to get my thoughts in order. It's definitely an important topic, and the sooner we get some ideas on how to realize that, the better. :)
do you have an idea how many people you expect to use this on each device (PC mouse+keys, laptop, tablet, phone)
that's actually a good question. I think there is a lot of unused potential in the smartphone as a validation/annotation device. Probably not all type of tasks are smartphone compatible, but if we find a way to classify tasks (easy - medium - hard) based on their complexity, I think we could be able to find a subset that's suitable for smartphones. Together with a use case that doesn't require users to invest a lot of time (e.q alarm clock), I think the smartphone could be a way to get a constant stream of "easy wins".
just a note to say I've seen the label separator(i see it defaults to using comma)and the 'shift-enter' add-label hotkey.. (discovered via settings) that's great, it really helps :)
From the dataset users perspective, I guess it's also nice to know, if all occurences of an object in an image are annotated or not.
but you could just train on the annotations: - ignore un-annotated areas; you still get the 'negatives' from the other annotations, i.e. feed all the annotations of 'car','cat','apple' to the 'dog detector' , with the expected value of 'dog=0' .
Also you could still aggregate the feature detectors across the image from an image based on the whole image labels.
personally I think you'll get more value by accumulating more annotations than by ensuring the annotations in any individual image are complete
to me the problem with the current approach is: you've got the attention of a user, who looks at an image and has a response - but then you're asking him to search for something specific (which means glancing around the image), rather than reacting to what he/she does see... you're using much more 'human visual cortex time' for the same data result.
This is why the 'add labels' mode is so much more pleasant to use: you see the image and you have your response.. and can communicate all of it to the site. especially now with the separator - each viewing yields 10+ pieces of information .. it only takes a few seconds to type that out
Now imagine if you could draw a bounding box around whatever catches your eye in the image.. and just treat it as another image (sub-image) which can be given it's own set of labels - perhaps that could be adapted from the add labels mode - i.e. it could present the sub-images as if they were just images - the user seeing it wont know any different . (a bounding box doesn't capture the object exactly, there's overspill, but you could still tell it what's around it.. e.g. the whole image might have "road, car,person,pavement,building".. draw a box around the person, and that box holds "person, pavement,building".. and so on)
perhaps allowing feedback from annotate when you want to skip would make it more engaging; if you record that, you still get something back from the users time (and the user will feel more engaged.. the tool it listening to his feedback) .. I would suggest main buttons "Skip , Done" - Skip replaces "Un-annotatable,blacklist" - but brings up this dialogue to get further information
too many - the task would take too long to do (but now the tool knows: this image has many of this label)
too small - there's few enough to do , but they're too small to annotate accurately with the current device (e.g. a few pixels of persons head). again, now the tool knows this label is very few pixels in the image
unclear label - I see water, but I'm unsure if it's a lake, sea, river, or canal. I see tarmac, but I'm unsure if it's a road, a driveway, a car-park, or a path in a park
better to do the parts - the object is large enough that it's easier to draw boxes around the head, hands,feet , tail (or even eyes, nose, mouth, ears for heads) the outline itself is complex, but the components are easily identified.
it covers most of the scene: it would be more efficient to highlight the pixels that aren't the label . example - the image is a person in a forest, and it says 'annotate: tree'. tree is 90% of the image, but the boundary of tree is a hole made by the person. I want to annotate the person instead.
the tool now knows most of the pixels are this label (that's how it differs to 'too many')
invalidateI know for sure the label isn't present (saves going through a seperate validation mode)
blacklist - preference - user doesn't want to do these labels
other (catch all incase there's reasons beyond these.. no further information given)
this is similar to the idea of 'label qualifiers' - the impulse to not annotate still tells you more about the label - and you could still use this information as a training signal - eg you could sum the feature detector coverage and give that an expected value. The above information is similar to combinations of: few/many, dense/sparse, near/far, foreground/background . The easiest annotations are 'few & near & foreground'.
Could you consider a 'divide and conquer' approach - the cases where it's too hard to do... split the image up. "too many" -> maybe next time just show half the image. You could do this recursively.. you could accumulate label lists for portions; a middle ground between precise annotation and image-wide labels.
this is kind of what I suggest with drawing boxes around 'the obvious objects' .. considering those as 'sub-images'.. areas that are different to the rest. (e.g. in example of 'a person in a forest', drawing a box around the person splits the image into two parts... the outer part just has 'tree', the inner part has 'tree,person'
to me the problem with the current approach is: you've got the attention of a user, who looks at an image and has a response - but then you're asking him to search for something specific (which means glancing around the image), rather than reacting to what he/she does see... you're using much more 'human visual cortex time' for the same data result.
you are totally right, good point!
My main concern is, that the task based approach is everywhere. It's reflected in the UI, the API, the way we store the data in the backend...So if it turns out, that we can't build the free annotation mode around the existing infrastructure, we have to touch a lot of stuff in order to make it work...which makes stuff break.
I hope we can find a way that allows us to implement the new annotation mode, but lets us keep most of existing mechanisms in place. There are obviously some UI changes needed to support the new mode, but if we could keep most of the existing APIs and the internal storage format, it would already be a huge win.
What do you think about a revision based approach?
Imagine that we have a picture with three dogs and two cats that we want to annotate using the new workflow. Here's how it could look like with the revision based approach:
User #1 starts by drawing bounding boxes around two of the three dogs and one cat. (he is too lazy to do mark the remaining cat and the dog, so he just saves his changes and continues with another image).
Then someone labels the bounding boxes with dog
and cat
(from the technical point of view, the labeling can theoretically also be done in the "drawing boxes" mode - it's just a matter of taste what we prefer)
As soon as we know the labels of the bounding boxes, we split them up labelwise - i.e we store them the same way as the annotation tasks now. So there is now a annotation task which contains the two dog annotations and another one which contains the cat annotation - both of those tasks get the revision number 1.
As the data is stored as in a task like structure, we could already do the same as we do now: We could ask the user in the validation mode: "Are all occurences of dog correctly annotated?" If the user presses "no" we could redirect him to the annotation mode, where he could add the missing bounding rects for that label. In case of the above example he would draw another bounding rect over the remaining unannotated dog.
As soon as the user saves the changes, we create another revision, revision 2, of the dog
task. So we now have revision 1, that has two annotated dogs. And revision 2 where all dogs are annotated.
The idea behind the revision concept is, that it makes it possible to revert to a good revision, in case someone messed something up (on purpose). I would see that similar to the revision concept of Wikipedia: per default, the latest revision is the most accurate one. If however, someone (accidentially) messed something up, users with specific permissions (moderators) could revert to the last known good revision.
Continuing with the above example: User #2 opens the image and sees, that everything of interest is already annotated, except one cat. So he draws a bounding rect around the cat and saves the changes. Internally we would again create another revision (revision 2) of the annotation task cat. We now have also two revisions of cat
- the first revision contains one annotated cat and revision 2 contains two annotated cats.
Would that work?
My main concern is, that the task based approach is everywhere. It's reflected in the UI, the API, the way we store the data in the backend
on reflection - maybe something like the 'touch-the-squares' approach could handle the difficult cases: not as precise as outlines, but an intermediate step after the image wide label.. and smartphone-friendly. Perhaps that could fix it, whilst still fitting in with the task based framework. Perhaps you could even use one to help verify the other.. or generate 'touch-the-squares' tasks if many people skip a task.
let me just read the rest and comment on that.
I'm sure there is a relatively easy solution .. just a question of figuring out the best retrofit.. (touch the squares: a whole new tool, but it fits into the task framework - or continued tweaks to the existing workflow, using the same tools ?)
on reflection - maybe something like the 'touch-the-squares' approach could handle the difficult cases: not as precise as outlines, but an intermediate step after the image wide label.. and smartphone-friendly. Perhaps that could fix it, whilst still fitting in with the task based framework. Perhaps you could even use one to help verify the other.. or generate 'touch-the-squares' tasks if many people skip a task.
I think a 'touch the squares' mode would pretty interesting (especially for smartphone users), but if we just integrate that on top of the existing concept, it wouldn't solve this problem, right?
to me the problem with the current approach is: you've got the attention of a user, who looks at an image and has a response - but then you're asking him to search for something specific (which means glancing around the image), rather than reacting to what he/she does see... you're using much more 'human visual cortex time' for the same data result.
I think one of the problem is, that we have quite a few different use cases and target groups. So every time we improve something for one user group, we need to make sure that we aren't accidentally killing a feature for another user group.
occasional annotator: I think for the occasional annotator both the task based approach and the free annotation mode is fine. Some occacional annotators might be overwhelmed by the sheer amount of objects in an image, so we could guide them with "annotate all xxx" tasks.
power user: The typical power user probably won't use a smartphone, but rather a PC/laptop to do the work. He contributes heavily to the dataset and works efficiently (hotkeys, shortcuts...). For this type of user a free annotation mode is probably the best, as he always finds little details to add.
smartphone usecase: I think the smartphone as a device has a lot of potential when it comes to validating stuff, labeling or easy annotations. In order to make it easy for other developers to build applications, we should strive for an easy (task based) API.
Requirements:
Personally I would see the following requirements:
possibility to validate and to rollback changes: for me personally that's one of the most important requirements. No matter what we will end up with, I think it's important to have a strategy to deal with bad actors. There will always be people which will upload wrong annotations on purpose or try destroy other people's work.
easy API + keeping API stable: I think in order to attract people to write applications on top of ImageMonkey, it's crucial to have an easy API. As we are still in the early stages I think we can still get away with API breaks - but I guess it wouldn't hurt, if we already could keep the API stable.
I think the above revision based approach could fulfill all the usecases while still respecting the requirements.
just thought of a minor tweak which might help: prioritise presenting tasks for certain labels (specifically person, car) .. maybe 'road' aswell; it's tree that tends to be the hardest;
it might be that the best order depends on the rest of the scene, i.e. use the presence of 'road' to hint a street scene..
I also wondered about using prefixes in 'add labels' e.g. "foreground car", "background foliage", "background building" "main .." to indicate subject, "periphery .." to indicate things around the edges. That would be fast to enter with 'add labels'. but then it struck me a few labels (car, person) will be predominantly foreground objects.
I was going to suggest a generalised prefix system for label qualifiers (background, foreground, dense,sparse,few,many; let the default be indeterminate for all) - however this might be hard to communicate, and naming conventions can be error prone. I recall the problems of remember "no..." -'no entry sign' and ".. of .." -'head of person' vs 'glass of water'
Imagine just using an asterisk prefix to hint an 'important label' (eg *car to hint suggesting car tasks first). Some street scenes are from the pavement, whilst others are from the road - one or the other will narrow down the rest of the image more..
"but if we just integrate that on top of the existing concept, it wouldn't solve this problem, right?"
so touch-the-squares would give you an easily completable rough annotation for the difficult cases.. less need to skip tasks. Perhaps in conjunction with hints per label (like doing the cars in street scenes first) , it would be enough..
seems like there's many ways to improve the situtaion.. but it's hard for me to guess which will fit the UI and codebase. I'm sure there are easy wins remaining.. the simple "comma" in add labels has helped usability enourmously
visual ideas.. what if you could roughly partition the scene with a single rectangle ("region of interest"?) - give a seperate label list inside and outside...
yet another idea.. could freely drawn rectangles be verified individually, outside of the main image tasks?
I was going to suggest a generalised prefix system for label qualifiers (background, foreground, dense,sparse,few,many; let the default be indeterminate for all)
I think that goes a bit in the direction of context aware labels (see discussion at #135).
however this might be hard to communicate, and naming conventions can be error prone.
yeah, right. What we could probably do is to show the user a list of attributes that he can tick and specify. Imagine a popup that appears when you click on a label. It shows you a list of checkboxes with text fields (similar to the flickr api https://www.flickr.com/services/api/explore/flickr.photos.search). We could show attributes like size
, weight
, background
, forground
, many
...)
In order to make it easy for the pro annotator, that prefers the keyboard, we could also implement a pseudo language/grammar, that allows the specification of attributes via keyboard (see #135).
The only thing is: how do we deal with ambiguity? The label car
can be foreground and background (if there are multiple cars in the picture). Maybe it makes sense to specify label attributes on a per-specific-annotation basis?
so touch-the-squares would give you an easily completable rough annotation for the difficult cases.. less need to skip tasks. Perhaps in conjunction with hints per label (like doing the cars in street scenes first) , it would be enough
aaah ok, got it - that makes sense.
Given you could choose completely freely, how would your ideal annotation tool loke like? Is it a more task based annotation tool or a less restricted one (similar to labelme)? I am asking, because I want to avoid that we incrementally improve workflows, but then again end up in a situation where we come to the conclusion: "Hey, what we really need is a less restricted annotation tool (similar to labelme)".
I totally get, that everybody has a different workflow, but as you've worked with both a lot, I think you are an excellent representative for the power user group. Also: What's more motivating (in the long run)? In my opinion power users are the people that drive the dataset's quality forward. Of course there are also the occasional annotaters that contribute, but the power users are the ones that add all the nifty details that makes browsing a well maintained dataset interesting. So for me personally it's also important that power users "feel at home".
I guess, if a completely different annotation tool is what we need, we should probably start evaluating whether it's possible to implement that with the current concept. If we come to the conclusion that it isn't possible without much effort, we can always decide what to do next and how we can improve the existing solution. I think it's easier, if we know the limitations of the current solution. “Shoot for the moon. Even if you miss, you'll land among the stars.”
"What's more motivating (in the long run)?" << my motivation is entirely the desire for the end product - because I know AI is really data driven, so I want to see a growing open-sourced+crowdsourced labelled dataset - so it's a case of 'be the change you want to see in the world'.. donate a some spare time, keep interest up, keep this project going. And I'm still hoping it can eventually be a graphics resource of sorts. (eventually, texture-labels, skeletal poses,..)
I know seeing the activity graph can encourage/discourage users ; so I know contributions show up there. (e.g. I got the impression LabelMe was mostly abandoned. Its office images are dated by CRTs :). 'image monkey' has a more modern look to it. .. ) the 'explorer view' is potentially motivating too: seeing the growth in the data set and being able to scroll through it.
Given you could choose completely freely, how would your ideal annotation tool loke like?
cherrypicking the best features:
the image wide label list here is definitely useful (being able to say whats there without needing to locate it yet)
I do think a curated label list is a good idea (r.e. naming conventions).. it's had a long 'incubation period' but it's much better now , and a growable label graph will really help
the hierarchical annotations and simpler workflow is LabelMe's best feature: I speculate that would help get more value from a label list (i.e. even without labels for all the objects, if you could identify parts like wheels, handles etc and group them together - you've expressed something in a machine-processable way.). It is nice having the option to see all the annotations aswell. Also general purpose label blending (any a/b
) and perhaps 'adjectives' (like material brick/..
, stone..,plastic.. prefixes) would get more out of a curated list.
r.e. 'spam' I would suggest that with a 'user confidence level' perhaps (in addition to validation). I agree with the need for filtering/validation. I did see some deliberate garbage in label-me.
as you've got the task based workflow, it's worth sticking with.. I think a few simple tweaks will fix it . it does have it's advantages, like the automatic uncluttered view (labelme does need the user to hide it's annotations sometimes); also keeping your hand on the keyboard entering the whole label list (as you can now), whereas in label me you must alternate drawing a box then typing the name. I can imagine eventually it being guided by feedback from the trained models.. which are the most problematic images and labels?
The only thing is: how do we deal with ambiguity? The label car can be foreground and background
I wonder what would give more value.. general label qualifiers, or a simple hint to "set this task first". I do like the idea that qualifiers could give extra training signals for minimal work
Many thanks for the detailed infos - very much appreciated. I will answer them tomorrow in detail :)
Just one thing that came to my mind (before I forget it again... ;)):
What about a toggle button in the labels view that allows you to show/hide the already existing annotations? That way you could click on a specific annotation and refine it with attributes (color, size, foreground, background...). That way we could enrich a specific annotation with details.
Here's a quick mockup of what I meant
Not sure if it's a good idea or not..just throwing it out to see what you think about it.
ok on further reflection:
when there's more classes of label e.g instead of just car
, many vehicle types: hatchback, SUV, pickup truck, saloon car, estate car, van, bus, coupe, 2 seater sports car, minibus, double-decker bus, truck, articulated truck, convertible, taxi, yellow cab, black cab
- perhaps that would skew toward individual annotations being more convenient (e.g. when the number of labels in the scene is higher, and the number of instances per label is lower). (many types of vehicle, many types of person, many types of building, many types of plant, many types of food ...)
Ideas for dealing with this:-
Just accept adding an opposite workflow: draw a rectangle and label it, one at a time. Treat each of theses as individually requiring verification. You use this mode when you expect you can say something unique about every box. Use the existing workflow for simplified labels.
if there was a label-graph, perhaps the tool could still aggregate them to generate an easy 'touch the squares' task: ("touch the squares containing: wheeled vehicle"),
subdivision - divide and conquer. Split the image into halves or even quarters and treat each as a sub-image. (the problem with this though is the system has no way of knowing the optimum way to split. )
my motivation is entirely the desire for the end product - because I know AI is really data driven, so I want to see a growing open-sourced+crowdsourced labelled dataset - so it's a case of 'be the change you want to see in the world'.. donate a some spare time, keep interest up, keep this project going. And I'm still hoping it can eventually be a graphics resource of sorts. (eventually, texture-labels, skeletal poses,..)
awesome mindset!
combining heirachical annotation with an image-wide label list, I wondered if the concept of a sub-image could be generalised. What if you draw a box, and that is almost treated as a new image, which can get it's own label list and internal annotations. The parent image inherits the labels of it's children, but it knows they're restricted to those bounding boxes. you can then go in and do the precise (smart?) annotation within it. That might save the need for zooming too? (i.e. editing the contents of the sub-image is done zoomed in to it) That might sound a bit complicated though.. lets think about it
Sounds very interesting! But I think in order to make that a fun and pleasant experience we would need the "opposite mode" first (i.e draw a rectangle and label it). If we have such a workflow in place I think it could indeed be worth a try to randomly serve image crops instead of full images to the user and ask them to mark interesting objects in there. Those image crops could be generated based on already existing bounding boxes.
I think the hardest part is probably the integration of the "opposite mode". If that's done, I think it wouldn't be that complicated to generate image crops - if we create only rectangular shaped crops it shouldn't be that hard to calculate the correct position in the full image.
r.e. 'spam' I would suggest that with a 'user confidence level' perhaps (in addition to validation). I agree with the need for filtering/validation. I did see some deliberate garbage in label-me.
For logged in users, I think it's fairly easy. What worries me more, are the users that are using the service unauthenticated or via the unauthenticated API. In retrospect, I think that's one of the reasons why labelme requires to be authenticated. I am not sure if they are actually moderating the content or have some spam checks in place, but with the required authentication they could at least theoretically step in if spam takes over.
But requiring authentication also sounds...wrong. So I guess the only possibility is to make the verification step as painless as possible and provide an attractive API (so that third party applications are more likely to integrate the service in their apps) and maybe trusted people (power users) with moderator privileges.
I can imagine eventually it being guided by feedback from the trained models.. which are the most problematic images and labels?
Definitely. Great idea!
I think we have gathered a lot of great ideas in that thread. Maybe it makes sense to keep thinking about it another few days and then vote for the most promising feature that we want to tackle next.
btw: I'll push a bunch of changes to production in todays maintenance window:
and I'll also make a bunch of trending labels productive.
I think that should make some things easier. (I'll post a quick message once production has been updated)
ok, production should now be up to date. The browse based annotation mode is available here (still need to add it to the UI): https://imagemonkey.io/annotate?mode=browse
Also found some smallish issues (see #138 ) which I'll fix in the next days.
definitely a step forward - it is indeed more pleasant to visually pick things that are doable. perhaps the system could make a default label suggestion in the box before you hit 'go' (a few randomly or'd together to give the user a feel of what they're like, or 'whatever there's least of so far..')
i also see the 'seperate select tool' in progress, this is nice. regarding hitting 'delete', would it be ok to delete the last thing you did if you click this outside of selection mode (i suppose it could highlight it first to make it clearer) .. kind of like a simple undo
using it a bit more ... I can confirm this has made it WAY more pleasant to use. The option for the large work area in settings is great too. With the intent of drawing easy bounding boxes, you can pick the labels and images that suit that, and do more in a sitting .. much more relaxing.
With a default random label selection .. would it be worth considering making this the default 'add labels' behaviour?' (if it gave you a default, you'd just have one more click to do something - and you could still change it to further refine. Even with the difficult labels, the browser mode means you can find the easier cases)
just a note r.e. validation: in images with 'car', i'm omitting vans, trucks,busses - assuming we'll eventually get dedicated labels for those; I got the impression the car label might have been intended as a universal wheeled/engined vehicle, but we tend to use the word separately to the other cases (i'm not sure what the strict definition is). There is of course overlap, e.g. 'pickup truck' is a sort of cross between a small truck and car, a 'people carrier' is almost a 'small van with lots of seats', and so on
using it a bit more ... I can confirm this has made it WAY more pleasant to use.
nice, that's great to hear :)
I think I now also found the reason why it takes so long until the concrete image is loaded in annotation mode - looks like there is a pretty bad performing database query. I might take the service down for a few minutes today somewhere between 9.00 and 10pm (vienna time) to update the system and check try if the new solution now performs better. It's pretty annoying that there is such a long delay between clicking on an image in the image grid and the actual loading of the annotation task.
just a note r.e. validation: in images with 'car', i'm omitting vans, trucks,busses - assuming we'll eventually get dedicated labels for those; I got the impression the car label might have been intended as a universal wheeled/engined vehicle, but we tend to use the word separately to the other cases (i'm not sure what the strict definition is). There is of course overlap, e.g. 'pickup truck' is a sort of cross between a small truck and car, a 'people carrier' is almost a 'small van with lots of seats', and so on
good point! Maybe it makes also sense to add the possibility to remove a label in the labels view? removing a label there would then have a similar effect, as you would voted with "no" in the validation phase. If there are a significant amound of "no" votes for a label, we could highlight it somehow (strikethrough?) and later hide it per default (+ add a button to "show all"). That way we would have a mechanism in place to get rid of wrong labels.
In case it takes too long to get rid of those labels due to the lack of votes (probably a problem if the dataset grows), we could also think about a moderator concept. e.q a moderator vote == xx (e.q 5) normal user votes.
With a default random label selection .. would it be worth considering making this the default 'add labels' behaviour?' (if it gave you a default, you'd just have one more click to do something - and you could still change it to further refine. Even with the difficult labels, the browser mode means you can find the easier cases)
Great idea! I'll create a separate ticket for that, so that I don't forget.
i also see the 'seperate select tool' in progress, this is nice. regarding hitting 'delete', would it be ok to delete the last thing you did if you click this outside of selection mode (i suppose it could highlight it first to make it clearer) .. kind of like a simple undo
Hahaha, I fixed that bug two minutes before I pushed it to production :D - it was basically working like that before. Just thought that it might be a good idea to enforce a selection, so that users don't accidentially remove polygons that were time intensive to create.
As you mentioned undo: Does it make sense to implement a general undo functionality? Or is there barely any need for that? (I don't want to bloat the UI with useless stuff).
should be available again. I think it should be way faster now.
confirmed, it's really fluid now, even more pleasant to use
lots of great tweaks in the past few days .. I think this nicely reflects how much better the annotation workflow is now:-
Just wanted to list these in one place to put things in perspective:- fixing any one of these would increase the useability of the tool and get more data accumulated..
choice of label vs site-driven tasks:
I always use this site from a laptop, which means a trackpad.. the rectangle tool is more pleasant than trying to click polygonal boundaries; as such , when you see an image, you want to choose the labels that are easy to draw boxes around.. rather than have to click through until you find one thats doable.
But just generally: you can make many judgements when you see an image and blast out rectangles quickly.. wheras with the current workflow, it shows you one image, most of the time you have to pass.. then wait for it to re-load. It's holding the users attention but only allowing limited input/expression.
in 'label-me' I was able to do hundreds of annotations in a sitting.. wheras this rigid workflow burns me out after about just 10.
This is why I'm after labels for 'head','hand','wheels','tree trunk' etc.. tracing a whole persons outline is really hard, but you can easily draw boxes around the parts. However the existing site driven parts are also sub-optimal, because it doesn't know which parts are visible, or when they're too small.
Some of the most common objects are the most difficult (eg trees) .
possible fixes:-
Conflation of selection and draw tool - prevents some annotations
Currently the tool seems to permanently be in a mixed state of draw/select - the problem with that is you can't annotate nearby/overlapping objects. It's even a problem with the rectangle tool (what seems to happen is as you overlap them.. they generate a combined bounding box which means you can no longer draw new rectangles in the same region), but it's crippling for the polygon tool because it selects polygons by their bounding box
possible fixes
Rectangle outline thickness
This prevents the annotation of small objects, and small nearby objects; in natural scenes objects are shown at a huge range of scales (e.g. you could have a zoom in on a single face, or a person could be a few pixels in the distance.. but a human viewer can still tell it's a person from context). This happens alot in street scenes. It IS possible to do pixel-precise annotation with a trackpad or mouse.
zooming doesn't help - the outlines are themselves magnified :(
Fixing this would eliminate many of the 'un-annotateable' cases
possible fixes
make the outline transparent e.g. 50% ? and draw a second single pixel boundary at 100% opacity: this would give something very close to the existing look whilst
or scale the thickness dynamically:
outline thickness = min(current object size/4, default_thickness)
... such that small rectangles will be drawn with single pixel boundaries. Compute that thickness as you're drawing: you can usually place the top left precisely, but currently when you start to drag the outline appears and you have to guess where the other corner is :(Single/plural/part ambiguity
From the wording and presentation of the label, it makes it sound like it demands singular labels (e.g. a box around each('all') individual 'occurence' of car, tree,person etc) ; this places pressure/uncertainty/stress on the user in the more complex scenes.. e.g. scenes might have 100's of 'occurrences'/instances (which are also impossible to annotate with the thick outlines, but the whole task is itself also too much for one sitting)
possible fixes
cover the area containing: car(s)
; Consider all existing annotations indeterminate. Wait till there is dedicated UI to switch between singular/plural mode for a precise meaning. Refine the existing annotations through a quiz.*dedicated singular/plural switch
annotate all car/cars
(one is greyed out, click to switch)There's actually 3 possible meanings for a bounding box - a whole object, an area containing multiple objects, or just part of an object (and you draw several to cover it - e.g. the tree trunk plus the foilage/canopy as seperate rectangles)
label text entry
The best part of this tool compared to labelme is the ability to assign labels without annotation: that is in the spirit of progressive refinement. They are still potentially useable training signals, e.g. you can ask a neural net to distinguish 'images that contain trees' from 'images that don't contain trees' even without the individual annotations
However there's usually about 10 possible labels in an image and currently you must move your hand between the keyboard and 'add' button to do them.. again this is highly fatiguing.
possible fixes
Something else: the CPU load of text entry can sometimes skyrocket: I'm guessing this is the autocomplete - causing a laptop to overheat and further slow down, e.g. individual keypresses can take a second to register.. if this is the autocomplete feature, maybe a delay on activating it would help- or letting the user type unobstructed, then say "did you mean..." on clicking submission