Open dobkeratops opened 7 years ago
http://labelme2.csail.mit.edu/Release3.0/browserTools/php/browse_collections.php?public=true&username=arandomlabeller&folder=/seq_submitted_alessandro_perina_sense_cam_aug_2011 interesting example - these photos are extremely low quality, and heavily distorted by a fish-eye lens... however , a human looking at them can tell from the context that it's a dining room, hence those vague splodges and streaks are dinner plates, forks, hands, heads etc
I really like the idea of presenting the user small crops with yes/no questions - haven't thought about that, but that's actually a really cool approach.
The only thing I am unsure is this:
[2] completely change the 'annotation' mode: just say "annotate all the pieces of this image" objects or components
The following is purely subjective and only my personal feeling:
I am always feeling more productive when I have clear specifications. So when there is "Annotate all apples in this picture" I can quickly scan the image, annotate all objects and move on to the next image. Due to the clear specifications I feel like that I get more work done. Everytime I click the "done" button and move on to the next picture I have the feeling that I accomplished something.
With the "Annotate all objects in the image" I often start thinking: "Hmm...I think I have labeled everything now. Oh no, wait....there is a small tower in the distance which can barely be seen. Should I also annotate that? And what about the cat's shadow in the sun? Is that also an object? Should I label that?" As the labeling of all objects in an image usually takes longer (and I am usually the guy that wants to see instant gratification to continue), I often end up labeling only a few of those.
But this is only my personal impression - so not sure if that also holds true for the majority of users. It would be really great to have way to validate which approach is the most promising one.
Some ideas that came to my mind:
Write a blog post describing the two approaches and let the users vote. ++ relatively easy to set up -- we probably only get the opinion from a specific user group (e.q people with technical background)
Present a subset of the website visitors the 1st approach and the other ones the 2nd approach and measure how long they are annotating images. -- we need many test users, ideally also some without much technical background (e.q kids) -- needs some serious implementation effort ++ we could get some actual data and usage statistics
you could look at it as subjective - 'annotate everything you find interesting in this image'; ... or you could think of it as a test of observational skill: can you spot details other people missed
[1] keep the single label on upload, (but with broad labels) straight away, you have useful training data.
[2] completely change the 'annotation' mode: just say "annotate all the pieces of this image" (or objects, or components?)
[3] the crops from annotation in the 'is this a ...' mode - let people decide what these are through yes/no questioning. even highlighting things like eyes would be meaningful (an eye can give a big hint if it's a human, dog , or cat ..). 'wheels' .. bicycles vs cars
I still think you're going to get a far more valuable dataset starting with scenes, rather than these 'staged' photos.. because those are nothing like what real machine vision applications are going to be dealing with.
I think this free image set: https://www.cs.toronto.edu/~kriz/cifar.html is already going to be better than what you can accumulate with the current approach;
Just a few broad labels will force a network to learn internally neurons representing key objects that distinguish scenes... 'street vs forest vs garden' - it will be counting trees, cars etc. 'bedroom vs living room vs kitchen vs office vs workshop' - it will be learning objects and furniture that are more commonly found in one or the other
We could debate ,say, the few best labels that would distinguish the broadest range of images (e.g. 2 distinct outdoor: urban vs rural; 2 types of indoor: home vs workplace) ..
but if you get the labels arranged into some sort of hypernym net .. you can always change your mind on that later; and you could always add more information by quizzing users on what they see
I bet you could come up with 5-10 labels that (especially in combination) would capture a lot of variety
scenes:- rural urban domestic office industrial natural
objects:- people vehicles animals plants tools furniture
components:- wheels eyes mouths hands feet windows seats