"kitchen" - is it a scene label?

ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.

https://imagemonkey.io

47 stars 10 forks source link

"kitchen" - is it a scene label? #195

Open dobkeratops opened 6 years ago

dobkeratops commented 6 years ago

could the existing {"annotatable":"false"} mechanism turn the kitchen label effectively into a 'scene label'... some of my uploads mistakenly omitted this flag, and it's ended up as a productive label. however you might intend 'kitchenette' aswell which might make sense as an area of a room (cooking stuff in the corner)

If you did convert it, perhaps you could add a few others in a similar vein - single words to describe some of the common scenes you have:- (imagine if you set the task of picking 1 best word for the whole image..)

street (most of my uploads..the urban scenes) or maybe more general urban rural or rural area (outside the city, but still inhabit, also farm?) wilderness uninhabited areas forest an image that is mostly trees, beyond simply containing tree. park grass/trees/path but within a city etc. garden - a lot of the stock photos of pets seem to be in gardens. similar to park but smaller, behind a house etc. office - some of the stock photos are office scenes bar/restaurant - some of the stock photos shopping mall - some fo the stock photos industrial area - (to complement street, office) living room, bedroom,bathroom,dining room - to complement kitchen airport - as you have some photos of these stadium - sports scenes town village city - there's one elevated image i put town/village for (that would almost be annotatable)

these might give nice hints for co-occuring object labels, or be useful for searching, and give yet another more accurate option for describing donation ("upload an image that represents..")

One question is should these actually be un-annotatable? 99% of the time they would be, but if you ended up getting drone footage or other aerial/satelite photos (rip from google maps?), then you could start marking these as areas

dobkeratops commented 6 years ago

(ok there's a scene where forest might be better as an actual annotation, not sure what single word is best for the whole image..)

bbernhard commented 6 years ago

could the existing {"annotatable":"false"} mechanism turn the kitchen label effectively into a 'scene label'... some of my uploads mistakenly omitted this flag

I can change those directly in the database..that shouldn't be a problem :)

One question is should these actually be un-annotatable? 99% of the time they would be, but if you ended up getting drone footage or other aerial/satelite photos (rip from google maps?), then you could start marking these as areas

yeah, you a right, there are definitely certain cases where it might make sense to annotate them. The question is now, how should we proceed here? I am not a big fan of restrictions, but in case we realize that 99% of the time the desired option is annotatable: false I think it might make sense to enforce that. In my opinion, the annotatable: true/false setting definitely makes sense for "normal" labels, but for scene labels I am not sure. I think it's more "destructive", than it actually helps. "Destructive" is probably the wrong word, it just creates some unecessary (sometimes impossible) annotation tasks. That's probably not that much of a problem with the browse based annotation mode, but with the random mode, it could "spam" the user with those tasks. But I have no strong feelings on the enforcement thing, so it's definitely open for discussion ;)

Many thanks for the scene label suggestions, really appreciated! I'll add those, as soon as we've figured out how we proceed :)

dobkeratops commented 6 years ago

i suppose you could just rely on drawing a box around the whole image, which is consistent - and there could be some other way of shortcutting that (when assigning the label?)

bbernhard commented 6 years ago

i suppose you could just rely on drawing a box around the whole image, which is consistent - and there could be some other way of shortcutting that (when assigning the label?)

I guess we could really do that, as you proposed, when assigning the label. So whenever someone assigns a label, it is checked if it is a scene label. If it is, then a bounding box will be drawn around the whole image. I think that could work. But we have to be really careful then what we pick as a scene label.

dobkeratops commented 6 years ago

perhaps we have a sliding scale for primary labels:-

scene labels - airport,kitchen, forest, park, garden, riverside, port, workshop, alleyway, shopping mall, suburban street, residential street, town centre, cityscape, wilderness, countryside, farm, living room, bedroom,hotel room, restaurant, bar, concert, office,shop interior,library,beach,harbour,seaside, classroom,auditorium, attrium,museum, ... - default assumption is that annotation covers the image..

single-object focussed (like the original intent, and the ImageNet dataset) - still needs annotating, but there's a reasonable probability that pixels in a centred blob are of that label- safe to do some training without annotations.

I wonder if this could be stated at upload, because that would fit the intent of the current wording ("an image that represents..")

even in the case of single object images, I wonder if you could do something like say 'dog in garden', 'car in street' etc as a single description of the entire image ... and you fall back to the single scene word if it's too complex to describe like that.

What I imagine is training a branched network , with the 'whole image label' as one output, and the regional annotations as another: this means you could train on the entire image set, even without annotations..


             image
               |
               V
             layer0 (conv)
               |
               V
             layer1 (conv)
               |
               V
              ...
               |
               V
           layerN (conv) 
    /                   \   
   |                     |  
   V                     V
whole                    |
image                final per pixel conv
layer(FC)           /    |  (1x1 feature remapping)
 |                 /     |
 V             acc       V
whole         V       perpixel
image      image      labels
label      label      (train on annotations)
[1]        list       [3]
           [2]

'acc' = 'accumulate'=convolution with weights of 1 to sum all the pixel labels into image wide labels, to train on the image-wide label list (regardless of annotations)

outputs:-
[1]= train on 'main scene label'
[2]= train on the entire label list, even  without annotations
[3]= per-pixel training using annotations

All 3 outputs feed back into training the same underlying image-wide features

bbernhard commented 6 years ago

Interesting idea!

I think what we should avoid is, that the "Donate Image" page gets too complicated. For me that page is still the one that should get people hooked. Ideally, one should just select an identifier (that can be a label, scene label, ...) and drag & drop the image into the dropzone to upload an image. In my opinion we should avoid that people need to have knowledge about the internals/specialities of ImageMonkey (trending labels, (scene) labels, task based approach, attributes system..) when donating an image - they will learn about that later anyhow.

I think if we overload the "Donate Image" with too many options (represented as dropdowns, checkboxes, radio buttons,..) users could easily lose interest; they do not feel instant gratification anymore. So no matter what we do here, I think we should limit the number of UI controls to one, max two.

I wonder if this could be stated at upload, because that would fit the intent of the current wording ("an image that represents..")

Do you have a suggestion how that could look like UI wise?

Would you add the scene labels to the existing label list? (In that case, it would be completely transparent to the user. The user just selects a label from the dropdown; internally we know that the selected label is a scene label and can treat that differently.)

dobkeratops commented 6 years ago

ok in retrospect I agree keeping the donate page simple is preferable (even the dropdown 'represents/contains' idea is too much)

better to enhance it with Refinement/Attributes, or something in 'add labels' (I had a simple suggestion to use an asterisk prefix to highlight a single 'main label' perhaps.)

Just adding the scene labels to the list should be enough, and we can worry exactly what to do about annotatability aswell. I'm guessing most of the time 'kitchen','forest' etc will be safe whole-image assumptions which can be cancelled in a rare case by validation/refinement. you could even make those labels more specific e.g "kitchen (indoor scene)","office (indoor scene)" "forest scene" vs "distant forest","hillside forest","overhead- forest area" etc... and even just having the raw labels like 'kitchen', 'forest' will be fine for the whole image branch

I would re-iterate the idea of changing "annotate X -> un-annotatable/done/blacklist" to "skip/ done". if you're unsure about what you're supposed to do in the case of being told to annotate 'kitchen', a 'skip' page could present that as an option ("it's a whole kitchen.."). Also does the Browse mode reduce the need for blacklisting? - I think this was to allow a user to tune their preferences for tasks , but with browse, they can be explicit. There's more ideas to enhance that like starting with a graphical label browser.. (a page with one example image of each and the label written below, clicking there spawns browse/annotate with the search box filled in)

bbernhard commented 6 years ago

Great ideas! I have a few other things on my list first, but as soon as they are done, I'll look into the scene labels stuff. I have some ideas on my mind, but not sure if they work out as expected - I think it pretty much aligns with the ideas you have :)

Also does the Browse mode reduce the need for blacklisting? - I think this was to allow a user to tune their preferences for tasks , but with browse, they can be explicit.

yeah, right, the blacklisting gets more or less obsolete by the browse mode. I think it can however be useful, in case you only want to work on specific types of images (e.q: maybe someone is interested in annotating person images, but only portrait images and not crowded scenes. so he could use the blacklist functionality to hide unwanted pictures from his search).