Multiple labels - PoC - Githubissues

ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.

https://imagemonkey.io

46 stars 10 forks source link

Multiple labels - PoC #43

Open bbernhard opened 7 years ago

bbernhard commented 7 years ago

The last few days I was evaluating how much effort it would be to allow multiple labels per image and make it possible to upload donations without a label.

The easiest way (in terms of implementation effort) would be to create another tab called "Labeling" (or something similar - I am usually not good with finding names ;-)) where you can add labels to images.

Here is a GIF which shows that:

label_all_objects

When you are done with adding all the labels you press the "Done" button and the next image gets shown. After you are done with adding labels, each label needs to be verified individually. That means for our example the image with the pizza would show up in the "validation tab" with the question: "Do you see a champignon?" If the user verifies that, the image would be made available in the "annotation tab" where the user can mark all champignons.

The advantage of this approach is, that it's relatively easy to integrate into the existing environment (small database changes and small changes in the workflow are needed). Furthermore complex annotation tasks would automatically be split up into smaller chunks (you only annotate a single type of object each time).

The disadvantage is, that you would probably work on a different image/scene most of the time the page reloads (but I think with a little bit of logic it should be possible to change that...so that you stay on the scene, but just the "annotation task" changes).

Would that be a workflow that would make sense? @dobkeratops what do you think?

dobkeratops commented 7 years ago

looks good

let me get this straight: In the "Labelling" tab/mode, you just enter textual labels, without actually clicking the areas; "this image contains: [tomato, pizza, mushrooms,..]" ( much like search-tags in traditional image sharing sites). Then you go to the "annotation" mode to actually draw.

I think that would be ok, and it even has some advantages over LabelMe, as you point out :-
(i) The 'yet to be marked labels' are still useable for training (because the NN can judge the whole scene), and search.

(ii) It would streamline input device issues (as we discussed) e.g. in LabelMe if you have 3 cars, you draw a box, specify "car"; draw a box, specify "car",draw a box, specify "car" (6 seperate steps, moving between keyboard/mouse or using fiddly dialogs) ... vs... "annotate all the cars in this image"; <draw 3 boxes...> (3 steps, hands on the mouse over the image all the time). it would even allow the separation of optimal work between devices: 'a laptop user prefers to do more keyboard text entry, a phone user prefers to draw bounding boxes or tap yes/no' ... perfect!

that you would probably work on a different image/scene most of the time the page reloads_

Even that would be ok, because it's variety (going through one specific theme can get tiresome); so long as different visitors can build on what each other did, it should be ok

Downsides I can think of:-

(i) Someone could specify some useful, but obscure detailed names, which most other people don't know, e.g. the names of some specific components, or botanical jargon for specific plants. Other users wont be able to identify these when asked to "annotate all the .." (fix this with an 'unsure' option in validation ? )

(ii) overlapping labels: Lets say you have some humans in a scene. You could give 3 labels: "person", "man", "woman". Some might not be clear in the distance, so you want to highlight them as just "person". But when then asked "annotate all the persons in this scene.." you might end up repeating work, which will later be covered by "annotate all the men", "annotate all the women". The other example is "vehicle", "truck" "bus" etc ; you can put 'vehicle' as a catch all if you're not sure exactly what the right term is (where's the boundary between a 'minibus' and a 'people carrier', etc, but you still know for sure they're both 'vehicles'). I suppose there's an argument for allowing people to label multiple times, as it will give confirmation.. if something gets marked as both 'person', 'man', or just part of a larger 'crowd' , thats ok if it's all consistent.

I'm all for implementing whatever is easiest first (even if it has downsides), and your proposed system actually has some advantages, so it's worth doing , and fixing these issues another way. If i've understood right, you've suggested doing this way as it fits neatly over your existing modes and UIs.

Perhaps eventually the seperate label entry mode could become more elaborate, e.g. showing you a few random visual examples of the words you're typing just to clarify that you've got the right names in your head.

dobkeratops commented 7 years ago

create another tab called "Labeling" (or something similar - I am usually not good with finding names

I agree this is hard to name, because 'LabelMe' already gives people the idea that 'Labelling' is both highlighting and text. But I can't think of a good alternative. It also reminds me of 'tagging', but "tagging" in Facebook also involves drawing rectangles.

'name objects' .. 'add object names..' ... 'add labels'... ? vs 'draw labels' .. 'highlight objects' .. 'annotation'?

or perhaps a little icon in the UI showing what you do could make the purpose of the tab clearer.. 'annotate' has a bounding box icon , 'label' has a keyboard icon, or a little representation of a tag list with a '+'

anyway that's not so important. you could always eventually add a direct shortcut between the two modes, so people can always find what they need

dobkeratops commented 7 years ago

Something else to think about .. I'm not sure how LabelMe's hierarchical object parts would fit into this; but it's not important right now. Maybe you could show the anotated crops as a new images, and you can still use the same UI, e.g. show a car crop , and you'd give the labels: 'headlight, taillight' wheel, windscreen, license plate'.

I was thinking it would be really useful to have some presets for all that anyway, because of the amount of work in typing component names

bbernhard commented 7 years ago

Thanks a lot for the feedback!

let me get this straight: In the "Labelling" tab/mode, you just enter textual labels, without actually clicking the areas; "this image contains: [tomato, pizza, mushrooms,..]" ( much like search-tags in traditional image sharing sites). Then you go to the "annotation" mode to actually draw.

Jep, right.

e.g. in LabelMe if you have 3 cars, you draw a box, specify "car"; draw a box, specify "car",draw a box, specify "car" (6 seperate steps, moving between keyboard/mouse or using fiddly dialogs) ... vs... "annotate all the cars in this image"; <draw 3 boxes...> (3 steps, hands on the mouse over the image all the time).

Definitely. Another advantage that I am seeing is: There is maybe no need for a "hide annotations" functionality (or at least it would be something that you would need very rarely). I could imagine to add a "show all annotations" function which shows all the annotations that were already been done on the picture (could be interesting to see what's already been there).

Someone could specify some useful, but obscure detailed names, which most other people don't know, e.g. the names of some specific components, or botanical jargon for specific plants. Other users wont be able to identify these when asked to "annotate all the .." (fix this with an 'unsure' option in validation ? )

yeah right, that could be a bit of a problem. The "unsure option" would definitely be something we could consider.

Another option I could see is to add the possibility to go directly from the labeling to the annotation. (could be another option of the pro mode). I am imagening a checkbox "Annotate me" (or something similar) which then directly takes you to the annotation tool where you can draw the bounding boxes for each label. We would "lose" the label validation step in that case, but I think that's no problem as there is still the "verificiation of the annotation" step. If the annotation is correct then we can also assume that the labels are correct.

(ii) overlapping labels: Lets say you have some humans in a scene. You could give 3 labels: "person", "man", "woman". Some might not be clear in the distance, so you want to highlight them as just "person". But when then asked "annotate all the persons in this scene.." you might end up repeating work, which will later be covered by "annotate all the men", "annotate all the women". The other example is "vehicle", "truck" "bus" etc ; you can put 'vehicle' as a catch all if you're not sure exactly what the right term is (where's the boundary between a 'minibus' and a 'people carrier', etc, but you still know for sure they're both 'vehicles'). I suppose there's an argument for allowing people to label multiple times, as it will give confirmation.. if something gets marked as both 'person', 'man', or just part of a larger 'crowd' , thats ok if it's all consistent.

good point - haven't thought about that.

One option would be to add a descriptive field which adds some additional information to that specific label. e.q: label: truck description: only trucks, no other vehicles (like busses)

But I am afraid that this might get too complicated? Looking at the dataset statistics (https://imagemonkey.io/explore) we got contributions from all over the world - which is really awesome. I don't know anything about the background of the people that contributed, but I could imagine that even some people without any ML/DL background tried it out. My concern is that if we add too much options and things that users need to take care of, that we scare off these users. Because now it doesn't feel like "fun" anymore, it feels like "work".

I'm all for implementing whatever is easiest first (even if it has downsides), and your proposed system actually has some advantages, so it's worth doing , and fixing these issues another way. If i've understood right, you've suggested doing this way as it fits neatly over your existing modes and UIs.

yeah, right. :)

or perhaps a little icon in the UI showing what you do could make the purpose of the tab clearer.. 'annotate' has a bounding box icon , 'label' has a keyboard icon, or a little representation of a tag list with a '+'

really like the idea with the icons!

I was thinking it would be really useful to have some presets for all that anyway, because of the amount of work in typing component names

great idea!

I am not sure if it's better to create those lists manually (maybe some kind of structured JSON with the main object ('car') on top following all the child objects ('headlight, taillight' wheel, windscreen, license plate') or if the system could learn from existing scenes (i.e: if other car pictures contain the labels 'headlight, taillight' wheel, windscreen, license plate' chances are good that this car picture also contains those - so suggest them to the user).

Another possible problem that came to my mind: Imagine that you are annotating some pictures. The first 4 annotations ('dog', 'banana', 'cat') require you to annotate objects which are the main objects in the scene (i.e they are pretty big and easy to spot). Now the fifth annotation requires you to annotate the 'wheel' in a picture with the main object 'car'. I could imagine that users tend to annotate the 'car' instead of the wheel, because they got used to annotating the main objects. :/

dobkeratops commented 7 years ago

Now the fifth annotation requires you to annotate the 'wheel' in a picture with the main object 'car'. I could imagine that users tend to annotate the 'car' instead of the wheel

perhaps annotating pieces of an object could be done by focussing the view on the current object. LabelMe has the concept of a 'current object' for adding parts (it is show bold in the list)

Looking at the dataset statistics (https://imagemonkey.io/explore) we got contributions from all over the world

the dedicated 'label' tab could eventually handle multi-lingual issues perhaps.

Eventually we could have the database of 'is-a' relations to help out with the overlapping labels. This would also be really useful for training, because you might want to train a net to recognise 'vehicles vs people', but it doesn't care about specific types .

tangential thought r.e. components, there's some interesting papers coming out about 'capsules' which vaguely described seem to involve recognising components then reasoning about their geometric relation ... I wonder if the parts-based labelling is of any relevance to all that (that's what's going through my mind when being able to mark headlights vs tail lights, the data has an instant indication of the car's orientation)

dobkeratops commented 7 years ago

Because now it doesn't feel like "fun" anymore, it feels like "work".

right. I'm not sure that labelling can really be fun , but a reasonable goal I see is that it's a moderately constructive way of killing time / relaxing. (I speculate that visual feedback might help, e.g. seeing the AI/databases's view of each label growing as you contribute, and perhaps helping to expand vocabulary.. show variations in the autocomplete (you type 'vehicle' and it could show a palette of vehicle types with jargon names as further suggestions for what might be in the scene). "users all over the world" .. you could show the labels in 2 languages, convince people they're getting some assistance in learning a language..

dobkeratops commented 7 years ago

another suggestion: group vs individuals - would it be worth the UI understanding this. in a scene with lots of people, do you mark the general area as 'crowd', or try to label each 'person'; in a street scene, do you label a clump of cars as 'traffic', or label each car.

What if the UI understood the concept of groups, and gave you an official option to mark the general area containing many, or individuals (like the mushrooms in the image above.. there's a clump of them but also 4 clear individuals) In LabelMe, we can sort of do this with parts, e.g. you can mark the area of 'traffic', then add each car as a 'part' if you have time.. but imagine if you didn't need two separate labels.

bbernhard commented 7 years ago

perhaps annotating pieces of an object could be done by focussing the view on the current object. LabelMe has the concept of a 'current object' for adding parts (it is show bold in the list)

I think in order to do that we would need a hierachy concept (as already suggested by you), right?

I am not sure if it is really a problem, but I could imagine that when you are in the "flow", and already have done quite a few annotations (where you annotated the main (i.e biggest) object) and there comes along a picture with a big dog (similar to the attached one) and you have to annotate all occurenes of 'eye', that you end up annotating the'dog' instead.

dog1

I think that people might end up annotating the dog instead of the eye, because they are in the "flow", maybe aren't paying too much attention and got used to annotating big objects. So we might end up with quite a few wrong annotations. Of course those annotations would probably be downvoted by the people in the "validate annotations" phase, but it would really be cool if we could avoid that in the first place (maybe by making it more visually clear to the user that he is most probably annotating some small detail?).

I am not sure if I am the typical labeling person, but during the testing phase of the new feature I accidentally annotated the wrong object more than once (it usually happened after annotating some big objects in a row).

The cleanest solution to that problem would probably be to introduce some sort of hierachy concept. But if possible I would like to avoid that for now. Can you think of any solution that would mitigate the risk of accidentally labeling the wrong object. Or do you think that won't be a problem at all?

The only thing that came to my mind would be to (internally) mark the first label as "main label" (I would assume that people start with the most prominent object). If people are annotating objects other than the main object there will be a small visual difference (either a different label color, a info text...) to grab people's attention.

dobkeratops commented 7 years ago

Can you think of any solution that would mitigate the risk of accidentally labeling the wrong object. Or do you think that won't be a problem at all?

hmm.. so if showing the dog zoomed in, it might not be clear that it want details.. .. perhaps you could zoom in a little, show the fact the dog is already annotated, and draw the 'component annotations' in a different colour

if there was a presets concept, a schematic illustrating which components it's asking for could also help.. or you could just show other random examples of labelled "dog.eye" etc from the existing database

bbernhard commented 7 years ago

perhaps you could zoom in a little, show the fact the dog is already annotated, and draw the 'component annotations' in a different colour

but how do we know that the user is annotating an object that is part of another object? I mean with a hierachy concept it would be more clear, because in that case one could make the 'eye' a child of the 'dog'. And with some backend logic we could ensure that the 'eye' is only shown for annotation when the parent object ('dog') is already annotated. But without any label hierachy I guess it's pretty hard to do so.

That's currently also preventing me from going live with the multi-label support. I am a little bit worried that introducing such a thing causes quite a lot wrongly annotated data. As the diff for the multi-label support is already huge (to be honest I wasn't expecting it to be so much work), I am now thinking if we already should introduce a hierachy concept at that point?

Usually I am a big fan of small and incremental changes (also to get some actual user feedback), but in that case I am not sure whether it makes sense to leave it like that and go live? I am afraid that we end up pretty fast at a point where we would need a hierachy concept. And in that case it's probably easier to implement that now.

Could really love to hear a second opinion on that. :)

if there was a presets concept, a schematic illustrating which components it's asking for could also help.. or you could just show other random examples of labelled "dog.eye" etc from the existing database

I really like that! The second approach of showing a random example from the existing database should already work at that point (so no hierachy concept would be needed for that. edit: I think I was too fast on that one:..even for that we would probably need a hierachy concept. ).

A schematic illiustration is also be something we could consider. But I think in order to make that work we would need to restrict the labels to a list that's created by us. That could be problematic for image donations that were uploaded without any label. In that case it's not guaranteed that there will be a label in the label's list that matches the picture's content.

dobkeratops commented 7 years ago

but how do we know that the user is annotating an object that is part of another object?

what i have in mind is showing the existing 'parent' object outline: so you know the database already 'knows' that and isn't asking for that again.

we end up pretty fast at a point where we would need a hierachy concept.

quite possibly.. I do think it's a great feature, so if you can add this .. so much the better

But I think in order to make that work we would need to restrict the labels to a list that's created by us.

perhaps with the project hosted on GitHub all along, there's a middle ground .. because anyone can make a pull request with new presets. That would save creating a load more UI for label management by just reling on JSON etc for that

bbernhard commented 7 years ago

Your last comment inspired me to try something slightly different

label_all_objects_2

The basic principle is:

labels are defined in a labels.json file and can be extended with pull requests
labels can have 'sublabels' (e.q 'eye' is a sublabel of 'dog')
when you label something then you usually operate on the 'label' level (sublabels are hidden)
when selecting a label, it automatically adds it's sublabels (which you can de-select if not needed)

Advantages:

mobile friendly (you don't have to type a lot of labels)
chances are high that the automatically selected ''sublabels' matches (de-selecting is always easier than manually adding the label)
labels are controlled via github - that hopefully prevents that we end up with a lot of different labels (e.q: 'dog', 'Dog', 'dogs', 'dg') for the same object.

Disadvantages:

not that fine granular as 'free labeling'
you need to add the label to the labels.json file first

What do you think?

dobkeratops commented 7 years ago

labels can have 'sublabels' (e.g 'eye' is a sublabel of 'dog') when selecting a label, it automatically adds it's sublabels (which you can de-select if not needed) labels are controlled via github

I think this would be ok.

would your system handle arbitrary nested hierarchy depth e.g. {"dog":{ "tail":{}, "body":{}, "head":{ "eye":{}, "mouth":{"teeth","tongue"},"nose":{},"ear":{}}, "leg":{"paw":{}} } } in JSON format describing a heirarchy of dog parts

you could also have labels that refer to scenes, to get potential objects as the sub labels ("street" - sub labels = car, person, road, ... "kitchen" sublabels=sink,cupboard,etc), although that would get back to needing to distinguish 'scene/object' labels (maybe you could specify that in the json somehow)

again without the burden of label-management UI , maybe you could store 'is-a' relations in the label database aswell (to allow people to explore, type 'vehicle' and there's a dropdown with vehicle types etc)

not that fine granular as 'free labeling'

free labelling could still be added later.

bbernhard commented 7 years ago

Thanks for your suggestions @dobkeratops, very much appreciated!

Proposed format of the labels.json file:

{
    "metalabels": [{
            "description": "optional",
            "name": "animal"

        },
        {
            "description": "optional",
            "name": "vehicle"
        },
        {
            "description": "optional",
            "name": "food"
        }
    ],

    "labels": [{
            "description": "optional",
            "name": "dog",
            "metalabels": ["animal"],
            "labels": [{
                    "description": "optional",
                    "name": "ear"
                },
                {
                    "description": "optional",
                    "name": "eye"
                },
                {
                    "description": "optional",
                    "name": "mouth"
                }
            ]
        },
        {
            "description": "optional",
            "name": "cat",
            "metalabels": ["animal"],
            "labels": [{
                    "description": "optional",
                    "name": "ear"
                },
                {
                    "description": "optional",
                    "name": "eye"
                },
                {
                    "description": "optional",
                    "name": "mouth"
                }
            ]
        },
        {
            "description": "optional",
            "name": "pizza",
            "metalabels": ["food"]
        },
        {
            "description": "optional",
            "name": "banana",
            "metalabels": ["food"]

        },
        {
            "description": "optional",
            "name": "car",
            "metalabels": ["vehicle"]
        }

    ]

}

I think your proposed "is-a" relationship could maybe be covered with something I have currently named metalabels (not sure if the name is good though). Each individual label can have a bunch of metalabels. I personally wouldn't expose those labels to the user in the labeling process. Where the metalabels kick in, is when you want to export/explore the dataset (e.q give me all data with 'food' in it). Is that what you imagined or did I miss something important here?

you could also have labels that refer to scenes, to get potential objects as the sub labels ("street" - sub labels = car, person, road, ... "kitchen" sublabels=sink,cupboard,etc), although that would get back to needing to distinguish 'scene/object' labels (maybe you could specify that in the json somehow)

I am currently truggling a little bit to understand the difference between 'scene' and 'object'. When you refer to a 'scene' do you just mean a 'collection of objects' or is it much more?

dobkeratops commented 7 years ago

I think your proposed "is-a" relationship could maybe be covered with something I have currently named metalabels

ok i can see this handles 'many metalabels' (if there is no clear tree organization), thats great

I personally wouldn't expose those labels to the user in the labeling process.

what I would hope is this could be generalised, and the metalabels could even be used (and later refined by other users with a 'what type of car is this' mode), e.g. you could label something 'animal' even if you don't know what type of animal it is.

imagine being able to browse all the labels with a miller-columns view

I am currently truggling a little bit to understand the difference between 'scene' and 'object'.

To me, the 'scene' is just the whole image; a scene label would be any descriptive word that's applicable to the whole. Examples could be broad words like 'raining',(it might be impossible to label 'raindrops' .. ) or broad words like "rural" / "urban".. "farm" .. "airport" etc

probably easier by giving more examples ...

Scene labels: 
-----
indoor,outdoor,garden, urban , residential area, suburbs,town centre, museum, art gallery, parliament, theatre, church, hospital,
domestic, kitchen, dining room, rural, wilderness, desert , cave, party, industrial, 
garage, airport, retail, office, school, coastal, seaside, harbour, beach, ..

Objects:  
----
tree, lawn, house, car, painting, lectern, stage, crucafix,TV, building,
sink, dinner table,
lathe, spanner, control tower, shelf, PC, desk, boat, stalagmite,..

'scene labels' would probably have their own 'isa' relations, e.g. "farm", "forest" could both be "rural"; "kitchen", "living room" could both be "domestic"; "board room", "cubicles","open plan" could all be "office"; etc.

I guess if you were looking down from the air, some of these scene labels could become object labels? (e.g. viewed from an aircraft up high, you could certainly highlight an "airport" as an object to aim at..) ..if you're looking at the plan view of a house, you've certainly got an object "kitchen" (the space within specific walls).. then again sometimes I'm talking about adjectives rather than verbs.

is a scene just an "object" that contains the image? airport example.. maybe .. ? but what about examples like 'dog' or 'screwdriver' which would never (IMO) constitute a scene. is it the fact the camera can go inside? (would endoscopy screenshots constitute 'person' scenes i.e. the view inside a person..)

maybe we want a distinction for image-wide adjectives?

do you just mean a 'collection of objects'

A lot of the time, specific objects would be enough to identify a 'scene-label', but if you make the scene label first, it gives you a training signal for very little user effort (e.g. you could train a neural net to distinguish 'kitchen'/'livingroom'/'bedroom', and it would develop neurons for 'sink','cooker','kettle','tv','sofa','bed' even if you hadn't labelled them), and further guidance for potential object labels.

"farm" vs "countryside" would be an example of a subtle difference. You might see a tractor driving through a country lane, even some animals being moved but this isn't a farm.. whereas a tractor driving on a field would be a Farm. It would be the placement of the tractor that matters..

I think these might slot in alongside your 'metaclasses'?

the 'LabelMe' dataset does seem to include these 'scene labels' in the XML but I can't find any way of setting it in the interface (maybe it's done on upload)

dobkeratops commented 7 years ago

"I personally wouldn't expose those labels to the user in the labeling process."

could it be completely generalised, i.e. no distinction between a 'meta label' and label.. although you have the right idea with the list of multiple metalabels (so it needn't be a simple tree organisation) (You'd have to be careful about cycles.. I guess your metalabel idea precludes that, and could certainly be a starting point to create a cycle-free database. ) e.g.

one possible path to locate 'labrador' in the label explorer organism -> animal -> vertebrate ->quadruped -> dog -> labrador

possible paths to locate 'dog' in the label explorer

animal-> carnivore -> dog

animal -> domesticated animal -> pet -> dog

animal -> furry -> dog

animal -> mamal -> dog

you might not know it's a labrador at the point you choose to label it a 'dog', but someone else could later refine it.

Might a rich nested metalabelling forgo the need for 'description' i.e. form the above we know a dog is quadruped, domesticated, vertebrate, animal,organism,furry,carnivore,mamal ... trace the graph back and dump all the words

Imagine this in scenes with plants, for the use-case of agri-bots (what is a weed, vs a crop?, and could you identify specific types of each? What about sub-types for 'ripe' ready for harvesting). That's probably a case to think more about.. are certain plants only "weeds" in the context of a farm? How would we go about that.. maybe the label 'weed' simply doesn't matter in a forest, but it can still exist in the graph of labels

thats' something for another post :) .. I was going to suggest looking at the tool from specific use cases ( self-driving cars/delivery, agri-bots, cleaning robots, meal calorie estimator ... ) could we get example scenes that relate to use cases to figure out issues... what will make a good labelling tool for practical useable data

dobkeratops commented 7 years ago

r.e. scene labels, I found some examples of scenes where it's quite hard to identify individual objects, but a scene label (e.g. "forest" vs "park" vs "garden") could still differentiate one image from another:

http://labelme2.csail.mit.edu/Release3.0/browserTools/php/browse_collections.php?public=true&username=arandomlabeller&folder=/web_static_park_garden_outdoor

there was another example: "shop" - the individual items are all found in homes (people browsing shelves of common household objects) but it's the context of them being lined up on shelves that identifies it as a shop instead of a home

dobkeratops commented 7 years ago

Something else to consider might be building in a concept of plural/groups e.g.

person : plural= people or crowd
bird : plural=flock
car : plural=traffic
tree : plural=trees, or forest
leaf : plural=leaves, or foliage

thinking of a crowd scene, if you're asked to 'annotate every person' .. it gets impossible in the distance; yet you can mark the area that contains people. A human can still tell from context that the mess of pixels in the top-middle-right is peoples heads, even though those blobs in isolation wouldn't trigger a face detector;

_55280494_dsc_0022

the other example is traffic scenes, where a row of cars recedes into the distance

dobkeratops commented 7 years ago

I hope I'm not confusing things with too many ideas and complications ... anything with multi-object labels will be good... and that remains the highest priority :)

.. but hopefully the tangible examples of 'self-driving car' and 'agri-bot' labelling would bring things into focus for other possibilities..

example applicability:- a 'scene' judgement alters driving strategy for cars ('<30mph in residential areas', 'drive more slowly in fog/icy conditions' 'switch on the windscreen wipers in rain' etc), or you could use the output of 1 net to select a detailed net for particular contexts

dobkeratops commented 7 years ago

(r.e. Scene labels: these could be used for context for object labels:- "park" :: bench = 'park bench' "industrial","workshop" :: "bench" = workbench "gym" :: "bench" = exercise equipment )

bbernhard commented 7 years ago

what I would hope is this could be generalised, and the metalabels could even be used (and later refined by other users with a 'what type of car is this' mode), e.g. you could label something 'animal' even if you don't know what type of animal it is.

good idea! :) I think we might be able to adapt the labels.json file to add such a "question like" mode. So that you can define questions and some possible answers for every label.

'scene labels' would probably have their own 'isa' relations, e.g. "farm", "forest" could both be "rural"; "kitchen", "living room" could both be "domestic"; "board room", "cubicles","open plan" could all be "office"; etc.

I guess if you were looking down from the air, some of these scene labels could become object labels? (e.g. viewed from an aircraft up high, you could certainly highlight an "airport" as an object to aim at..) ..if you're looking at the plan view of a house, you've certainly got an object "kitchen" (the space within specific walls).. then again sometimes I'm talking about adjectives rather than verbs.

is a scene just an "object" that contains the image? airport example.. maybe .. ? but what about examples like 'dog' or 'screwdriver' which would never (IMO) constitute a scene. is it the fact the camera can go inside? (would endoscopy screenshots constitute 'person' scenes i.e. the view inside a person..)

maybe we want a distinction for image-wide adjectives?

first of all, thanks for the clarification - it's now much clearer to me. You raised some very interesting questions, which definitely got me thinking. One thought that's currently wafting around in my head is: Is it necessary to give scenes it's own represenation (in the UI and/or database) or can we treat them as an object as well. I am thinking about Unix' "everything is a file philosophy". Maybe we can use the same principle also here. i.e: "everything is an object"?

A lot of the time, specific objects would be enough to identify a 'scene-label', but if you make the scene label first, it gives you a training signal for very little user effort (e.g. you could train a neural net to distinguish 'kitchen'/'livingroom'/'bedroom', and it would develop neurons for 'sink','cooker','kettle','tv','sofa','bed' even if you hadn't labelled them), and further guidance for potential object labels.

good point. Maybe we can implement a "describe this picture with one sentence" (e.q woman standing in the rain and waiting for the bus") mode and gather the scene labels/metalabels that way?

you might not know it's a labrador at the point you choose to label it a 'dog', but someone else could later refine it.

I really like your idea with asking questions to refine labels and improve the dataset's quality. I hope that we can delegate a lot of these refinements to that mode, as I think it could be the more fun way :)

thats' something for another post :) .. I was going to suggest looking at the tool from specific use cases ( self-driving cars/delivery, agri-bots, cleaning robots, meal calorie estimator ... ) could we get example scenes that relate to use cases to figure out issues... what will make a good labelling tool for practical useable data

awesome idea! I saw that you already created a new ticket for this - so let's discuss that there :)

Something else to consider might be building in a concept of plural/groups

good idea! Now the question is, should that be a separate type of label or more like an option that you can tick? e.q I could imagine a small icon/button which turns a label into it's plural form. So you would still select "person" as a label, but after clicking a small toggle button, it turns itself into "crowd".

edit: The new labeling mode is now live: https://imagemonkey.io/label I am currently not 100% satisfied with the result (mobile version needs some improvements, label search needs some improvements, 'Help me' isn't up to date anymore,...) but I think for the first version it's not that bad.

Currently, only 'dog' and 'cat' have sublabels ('eye', 'ear', 'mouth') defined. If a picture has the label 'dog' and you want to add some sublabels ('eye/dog', 'ear/dog', 'mouth/dog') to it, just add the base label 'dog' again to merge the remaining sublabels into it.

dobkeratops commented 7 years ago

"should that be a separate type of label or more like an option that you can tick? e.q I could imagine a small icon/button which turns a label into it's plural form. "

The best is an option, but you can get to it by both (e.g. if you select "crowd", it knows the singular form is 'person', and vica versa) .. the key thing is it knows the relation person.plural(crowd) crowd.composed_of(person).

I'd just made an example JSON file for a label list with the 'nested isa' idea, https://github.com/dobkeratops/imagemonkey-core/blob/master/wordlists/en/labels.json and it got me thinking it would be interesting if the system understood more about label/part relations: e.g. counts. "tricycle" has "wheel", "bicycle" has "wheel", but what if we could say "tricycle" has wheels:3, "bicycle" has wheels:2 ; actually specify a range, because some parts are optional ("0-1"), etc. "crowd" has "person":"2-..." . maybe you could say more like "usually_found_in", ("kettle":{..usually_found_in:["kitchen",kitchenette]} etc also "in", "on" car:{..usually_on:["road"]..}

The new labeling mode is now live: https://imagemonkey.io/label

awesome , i'll take a look (* EDIT.. actually my file is a mutant JSON with non-quoted keys, do many parsers accept that.. I recall how LISP has a seperate concept for a 'symbol' and 'string' .. if not I can always fix it)

( thats another subtle point.. you want a 'crowd' detector to light up for an image with more than one person? perhaps crowd is slightly more specific: many overlapping* people , a certain density. )

dobkeratops commented 7 years ago

scenes it's own represenation (in the UI and/or database) or can we treat them as an object as well ... Maybe we can use the same principle also here. i.e: "everything is an object"?

yes in retrospect, thinking about it from the perspective "an farm is an object in a landscape", "a kitchen is an object in a house" etc, perhaps it does make sense to hammer these into objects - and just have the ability to apply a label to the entire image? ('this photo is taken inside a kitchen object' .. all other labels are parts of kitchen).

it might make sense to say it's the "background"? (e.g. if you have a photo of a dog in a park , you label the scene 'park', and annotate the 'dog' with it's bounding box)

The other cases I mention for 'scene labels' might be adjectives or states - weather etc. ('this scene is windy' .. 'this scene is night-time' etc).

When I went through trying to make such a label list, I tried "rural area", "urban area" , which could make sense as objects, as opposed to just the words "rural", "urban".

there's still an issue that some of these can be a bit fuzzy (as you move from the town centre to the countryside, at what point is it "city", "suburb","town", "village", "countryside", "wilderness").. but for a scene label, the user can pick the word that dominates when seeing the image. If you were literally labelling a map, you might have greyscale masks for each