ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

current dataset labels and label format #47

Open dobkeratops opened 6 years ago

dobkeratops commented 6 years ago

I made my way through the current images in 'add labels' mode, these are suggestions for additional labels based on what I saw there (a few extras where I added related items I know from the 'LabelMe' images, but 70% of this list are directly in those images) .. thats 266 labels https://gist.github.com/dobkeratops/4c78501d4f82f51f6aa3410fe02515db

what I wanted to discuss was the json format again, I note the list includes the label name as s property.. is this absolutely necassery (e.g. does it make internationalisation easier?, does it translate more directly into the in-memory format). My suggestion for a format would be a map with the labels as keys and additional detail as a value object (eg.. "dog" : { isa:["animal"]} etc; That puts the label 'up-front' making it easier to read; Furthermore , my suggestion for a format would include a shortcut for listing many subtypes directly 'inline' e.g.

```"car":{ "isa":["vehicle"], "examples":["convertible","prestige","sports","SUV","hatchback","coupe","sedan","grand tourer","taxi",...] }```

the system would have to expand those out on loading i.e. create "convertible (car)":{isa:["car"]}" etc. So you could build the graph 2 ways B<-A, C<-A A->[B,C], with the shortcut for making many 'leaves'. Maybe we could also do that for the 'has..' relation too, i.e. add a 'part-of'.

I think something like this would make maintenance more user-friendly,

the vocabulary could get huge .. I imagine 1000's of labels once you really start getting urban and domestic scenes (this database still predominantly has 'staged' photos focussing on one primary subject.. these are just the suggestions collected for surrounding objects)

anyway I just wanted to mention this before I write a little script to manually expand this list into the current format

bbernhard commented 6 years ago

Good points, Thanks for bringing those up!

I have mainly chosen the list format, as I couldn't figure out how to parse arbitrary JSON key/value ojects with golang (golang is really special when it comes to parsing JSON and so much different than other programing languages). After parsing the JSON I am internally building a map again for faster access.

But I think I finally figured it out how to parse arbitrary JSON key/value objects. So I think your proposed format should be doable.

I'll start right away with the changes and get back to you once I am done. :)

dobkeratops commented 6 years ago

what I was going to do was write something to mess with the graph anyway, so its not like you must change it. I realise the method I suggest will need some validation i.e. "no cycles", and there might be redundancy, so it might take a tool to reduce those. The merit of your current setup is all that is precluded, whilst still having them groupable - so it's much better than a flat list, for not so much effort.

(I did do a big label list under my proposal .. I also took the liberty of using 'mutant json' with unquoted keys , but fixing that is just a search replace. I kind of with json allowed those.. I always allow them in my own parsers, but I realise it's strictly not part of real son.)

I did wonder about internationalisation, e.g. making the english label a key sort of 'bakes the english language' into the system; in my proposal you'll need to treat translated labels differently, e.g. a map stored inside , maybe "car":{ translations:{french:"voiture",}...} I suppose you could take the attitude that translation is a separate layer. Is it worth thinking about translations early, e.g. to encourage people to pick english names that are easily translated without ambiguity. A multilingual database could be a 'selling point'? (get people labelling to learn english even if they don't already speak it already)

'car' usually = automobile, but there's "passenger cars" on railways too. https://en.wikipedia.org/wiki/Car_(disambiguation) . I wonder if we'll run into subtle cases

bbernhard commented 6 years ago

what I was going to do was write something to mess with the graph anyway, so its not like you must change it.

No worries, I wanted to change the format of the JSON anyway, so why not do it now :)

Internationalization is indeed something we should think about early in the process. Currently I am planning to keep the labels in the database english only - so your proposal with making the english label a key wouldn't be a problem. At the moment I think it would be enough to do a lookup in some sort of in-memory translation table before the database query gets built. If that for some reason doesn't scale anymore with ten thousands of labels we could consider adding the translations to the database schema.

I am not sure if it's better to add the translations to the general labels.json file or if we should create translation files for that (.german, .french...), which contains the translations. I think with a lot of labels it could get cumbersome to manage the translations. But that's just a minor detail and can be refactored later if necessary.

'car' usually = automobile, but there's "passenger cars" on railways too. https://en.wikipedia.org/wiki/Car_(disambiguation) . I wonder if we'll run into subtle cases

I could imagine that the optional description can be used to clarify the meaning of a label. So if you would type "car" in the search field, both the "railway car" and the "car" would pop up. Both of this labels could be part of the "vehicle" label. (i.e isa: ["vehicle"])

Another proposal of labels.json:

{
    "metalabels": {
        "animal": {
            "description": "optional"

        },
        "vehicle": {
            "description": "optional"
        },
        "food": {
            "description": "optional"
        },
        "electronics": {
            "description": "optional"
        },
        "nature": {
            "description": "optional"
        },
        "sports": {
            "description": "optional"
        }
    },

    "labels": {
        "dog": {
            "description": "optional",
            "isa": ["animal"],
            "has": {
                "eye": {
                    "description": "optional"
                },
                "ear": {
                    "description": "optional"
                },
                "mouth": {
                    "description": "optional"
                }
            }
        },
        "cat": {
            "description": "optional",
            "isa": ["animal"],
            "has": {
                "eye": {
                    "description": "optional"
                },
                "ear": {
                    "description": "optional"
                },
                "mouth": {
                    "description": "optional"
                }
            }
        },
        "pizza": {
            "description": "optional",
            "isa": ["food"]
        },
        "orange": {
            "description": "optional",
            "isa": ["food"]
        },
        "apple": {
            "description": "optional",
            "isa": ["food"]
        },
        "banana": {
            "description": "optional",
            "isa": ["food"]
        },
        "car": {
            "description": "optional",
            "isa": ["vehicle"]
        },
        "TV": {
            "description": "optional",
            "isa": ["electronics"]
        },
        "smartphone": {
            "description": "optional",
            "isa": ["electronics"]
        },
        "cup": {
            "description": "optional",
            "isa": []
        },
        "glass": {
            "description": "optional",
            "isa": []
        },
        "spoon": {
            "description": "optional",
            "isa": []
        },
        "egg": {
            "description": "optional",
            "isa": ["food"]
        },
        "tennis ball": {
            "description": "optional",
            "metalabels": ["sports"]
        },
        "bullet": {
            "description": "optional",
            "isa": []
        },
        "tree": {
            "description": "optional",
            "isa": []
        }
    }
}

I think it would be possible to make the ear, eye and mouth labels top-level labels to remove some duplication, but I kind of like the fact that they are child labels of the main label. (you can see the hierarchy by just looking at the JSON file).

With this format we could also add your proposed examples and translations attributes.

I am wondering if we need both the metalabels and the isa attributes?

What do you think about the new format?

dobkeratops commented 6 years ago

Looks good, I understand the seperate metalabels help right now, hopefully in future we can generalise I'm still yet to experiment myself as I described with something to trace a general graph. I could even try to cluster a general graph to find the best set of metalables, given 100's of examples . So long as the door is open for it to be truly general eventually..

I was just going to say - I had that huge 'potential label list' but one example jumps out right now (worth adding manually): person (or man, woman). There's quite a few photos with a 'person holding a smartphone', 'person and a dog' etc. That's the one label added to your existing will will enhance your existing dataset the most :)

I think it would be possible to make the ear, eye and mouth labels top-level labels to remove some duplication

yeah this takes some thought ... I'm not sure what the answer is. The thing is , dog::eye, human::eye , insect::eye really want to be different items (e.g. 'show me insect eyes', 'show me human eyes', although all would have 'isa:["eye"]'. Perhaps the system can expand those out on initialisation or something. ("insect":{has:["eye"]} generates an item "insect::eye" { isa: ["eye"]}

I imagine that isa is a bit like 'inheritance' , so you say dog, cat 'isa' quadruped, and 'quadruped' has 'legs, tail, head' etc such that each quadruped doesn't have to say that again ... again I haven't even tried this yet, this takes experimenting (those would still have to make 'dog arm' etc). The need gets clearer with breeds of dog etc, although if those are just listed as 'examples' (a single level) it's much easier.

We could view this as a 'fixed format heirarchy' metalabels -> labels -> examples which should be enough initially, but eventually that could be converted to something completely general. So if you put the parts in the 'label', the 'examples' inherit. Thats a smart mid-term solution. ~10 metalables X ~10 labels X ~10 examples&parts = ~1000 easily managed labels. The full general graph will make pushing beyond that easier, and eliminate ambiguous decisions (everything the same rather than having to decide which level to place it at..)

another thing I was going to discuss was the specific representation of 'disambiguation'/namespacing/components; currently your UI shows mouth/dog etc. That makes logical sense from the perspective that those pixels should trigger both mouth and dog. Some alternatives .. above i've mimicked c++ name spacing.. although that would probably be rather arcane for the casual user ('what the hell does double colon mean?') .. so thats probably a bad idea ; the other reference point is wikipedia's use of brackets for context, e.g. https://en.wikipedia.org/wiki/Pointer_(user_interface) https://en.wikipedia.org/wiki/Functionality_(chemistry) https://en.wikipedia.org/wiki/Tail_(horse) https://en.wikipedia.org/wiki/Intersection_(road)#Fork ... you might find it possible to get direct wikipedia links in some cases, e.g. they have a redirect for mouse (computer) keyboard (computer) tail (horse) is already a wikipedia page. etc

anyway that again is a minor point, I don't imagine it's a deep issue right now

bbernhard commented 6 years ago

I was just going to say - I had that huge 'potential label list' but one example jumps out right now (worth adding manually): person (or man, woman)

cool, will add that one manually!

Do you have any preference regarding person vs man/woman? I mean we could add the more general label person to the "Add labels" tab and do the refinement with a quiz question: "Female or male?"

Looks good, I understand the seperate metalabels help right now, hopefully in future we can generalise I'm still yet to experiment myself as I described with something to trace a general graph. I could even try to cluster a general graph to find the best set of metalables, given 100's of examples . So long as the door is open for it to be truly general eventually..

awesome! yeah, right...I think we might need to iterate over the whole label structure quite a few times anyway, until we are completely satisfied. I am expecting that there will be some "lessons learned" :D

One potential usecase I can see for separating metalabels and normal labels is the possibility to use metalabels for quizzes (or other gamification), where we can't annotate something. e.q: "Which hair color does this woman have?", "Is it raining?". If we use normal labels for that, the picture would end up in the annotation tab with the request to label raining. I am not sure if it's the best idea, but I am currently thinking about storing metalabels in the same PostgreSQL table, but with a flag (is_meta?) to indicate that it is a special label where nothing can be annotated.

We could view this as a 'fixed format heirarchy' metalabels -> labels -> examples which should be enough initially, but eventually that could be converted to something completely general.

good idea!

another thing I was going to discuss was the specific representation of 'disambiguation'/namespacing/components; currently your UI shows mouth/dog etc. That makes logical sense from the perspective that those pixels should trigger both mouth and dog. Some alternatives .. above i've mimicked c++ name spacing.. although that would probably be rather arcane for the casual user ('what the hell does double colon mean?') .. so thats probably a bad idea ; the other reference point is wikipedia's use of brackets for context, e.g. https://en.wikipedia.org/wiki/Pointer_(user_interface) https://en.wikipedia.org/wiki/Functionality_(chemistry) https://en.wikipedia.org/wiki/Tail_(horse) https://en.wikipedia.org/wiki/Intersection_(road)#Fork ... you might find it possible to get direct wikipedia links in some cases, e.g. they have a redirect for mouse (computer) keyboard (computer) tail (horse) is already a wikipedia page. etc

yeah, the UI representation is something I am not really happy at the moment. I experimented with a few different visualization methods (tables, treeviews, buckets) but all of those have it's advantages and disadvantages.

But you are totally right, using the Wikipedia syntax would probably be better (as it is known), but ultimately it would be great to get rid of the "duplication" (appending dog to every sublabel adds a lot of unnecessary information that takes away space). But I haven't found any visual representation yet, which looks good on mobile and desktop :)

dobkeratops commented 6 years ago

"Do you have any preference regarding person vs man/woman?"

Not sure , but this is worth thinking about - how will the process look with metalabels and future expansion.

My gut reaction is to ask for man, woman, boy, girl; and later add a metalabel human But another option is to start with person, then later add examples: man, woman etc. Sometimes it's actually hard to tell the gender (a person in the distance, a person in gender-neutral clothing, a man with long hair facing away from the camera). Ok so how about man woman boy girl child person, and a metalabel human covering all.

Elsewhere in the database I saw "soldier" (in turn, more specifically, "US marine"). in LabelMe, I see many examples of "workman". I think "person" would want to expand out to a very rich structure of sub-types. Would it actually help to have multiple overlapping labels (most soldiers are "men" but some are female.)

how about this:- person:{ examples:[man,woman,boy,girl,soldier,nurse,workman,policeman,policewoman,..]} But you'd need a seperate way of saying workman isa man, nurse isa woman. You want to be able to train a neural net at any chosen reduced level of granularity ('a net that cares about identifying any people' vs 'a net that cares about distinguishing combatants from civilians'). what I imagine is you could later add labels that repeat whats shown in 'examples' , and the system just fills in any links ( A{{ isa:[X]},B:{ isa:[X]} generates X:{examples:[A,B]} and vica versa.

Could types of human be covered by equipment (eg. 'soldier' identified by carrying a rifle, wearing a combat helmet ... a workman wearing a hardhat ?

Is it even worth adding a completely different structure for attributes such as gender?

Lets say if we start with man woman boy girl,it's reasonably likely we can adapt the data to anything else later (retrofit the overlaps)

but ultimately it would be great to get rid of the "duplication"

I don't think it's a big problem now, you could even just say "mouth of dog". later it could get smarter and eliminate 'dog' if there's a dog and nothing else with a mouth; but if you see a man and a dog (quite common in scenes) , then you need to distinguish.

I imagine that those labels really will exist separately in the database, e.g. "mouth (dog)" isa "mouth" etc.

bbernhard commented 6 years ago

Some discussion points that came to my mind today:

Imagine that we have a few hundred labels with a lot of refinements in our labels set. e.q: (person, man, woman, nurse, doctor, US marine, soldier, policeman, policeman,...+ all the female counterparts). Now imagine there is a picture showing a US navy SEAL that you want to label. So you would start typing US and the label US marine would show up. As there is no US navy SEAL available in the labels set, you start thinking: "Should I use US marine instead or is person better? I mean it's not really a US marine, but it's more specific than person. Okay, so let's just take the US marine.".

I think that's an interesting case, because by introducing more fine granular labels we also increased the probability to make mistakes. While the person label is less precise, it would still be a correct label, while as the US marine would be wrong by definition.

Another example: In some languages (e.q german) it's pretty common to leave the sex away for occupational categories. So you would just say cop or officer even if it is a female officer. So if we would have both female and male versions in the database we might end up with wrong labels.

Just a though experiment:

Would it be possible to cut the number of labels down to just "base labels" and do the refinement completely via gamification (e.q quiz)? I am thinking about the famous ""I spy with my little eye ..." child game where you have to guess what object the spy saw.

So if you know that it's dog, you could ask a few questions to collect the following attributes:

I am really not sure, but I could imagine that the quiz thing is also a little bit more mobile phone friendly - as you don't have to type that much information. Picking up your idea with educating people on the way: we could maybe show some "Did you know" information gathered from the dataset. e.q: "Some other short haired dog breeds are: ...".

With questions we would also have the possibility to show a Unknown or Other option in case our predifined list doesn't contain the item (e.q a special dog breed).

What do you think about that? I mean it would be a really different approach compared to other services (like LabelMe), with the quiz as integral component of the labeling.

I don't think it's a big problem now, you could even just say "mouth of dog". later it could get smarter and eliminate 'dog' if there's a dog and nothing else with a mouth; but if you see a man and a dog (quite common in scenes) , then you need to distinguish.

that's a good idea, haven't thought about that!

dobkeratops commented 6 years ago

Does the discoverability suffer if we add a lot of fine granular labels?

Ideally no: the tags/subtypes (metalabels->labels->examples) should assist discoverability. What I envisage is that this structure (and eventually completely generalised lablels->lablels->..) lets anyone pick the granularity they want. If you want a net that just distinguishes people, cars, bicycles,trees... you dont care about the fine grain. Any labeller can submit a 'coarse' label (and someone else can refine it), or vica versa you can submit a precise label (exact car model) and when asked, the system can simplify it to just say "car".

So you would start typing US and the label US marine would show up. As there is no US navy SEAL available in the labels set, you start thinking: "Should I use US marine instead or is person better?

So we have to be smart about the curated JSON list. What you've identified is 'US marine' is a bad label unless we first say "soldier", "american soldier". I think there will be a lot of trial and error. Eventually a generalised graph will make it easier, but that has the risk of cycles - as such I think it's worth seeing how far we can get with your meta-labels idea first.. that guarantees it is acyclic.

Another example: In some languages (e.q german) it's pretty common to leave the sex away for occupational categories. So you would just say cop or officer even if it is a female officer

good point (and that's a great motivation to consider internationalisation early) ...and people might even try to do this for gender equality elsewhere . How about police officer (man) , police officer (woman) just like we will have mouth (dog) mouth (human)

I am really not sure, but I could imagine that the quiz thing is also a little bit more mobile phone friendly

absolutely, but I hope we can still let laptop/keyboard users submit more comprehensive,thoughtful input. The games can verify what they said.

With questions we would also have the possibility to show a Unknown or Other option in case our predifined list doesn't contain the item (e.q a special dog breed).

strongly agree :)

dobkeratops commented 6 years ago

in parallel I was just running through some more examples in my head seeing how it could work (based on things i've seen here ,bits in LabelMe and some photos I might submit soon).

metalabels:- // will be 10s
    vehicle,       animal,      human,       flying,
    military,      aquatic,     building,    weapon,
    tool,          machine,     furry,       feathered,
    quadruped,     metal,       wood,        stone,
    plastic,       plant,       weapon,      cutlery,       
    furniture,    domestic, religious, body part,
    mechanical component,  ornament,industrial

labels   //will be 100's
    "attack helicopter"    :{isa{vehicle, military,flying,metal,rotocraft}, examples:[apache,..]}
    "helicopter" :{isa{vehicle,flying}}    //! hmm i want to make a sideways 'isa' , 'attack helicopter'
    "soldier"    :{isa{human, military}}
    "bird"    :{isa:[animal, flying,feathered]}
    "dog"   :{isa:[animal, quadruped, furry]}
    "rifle" :{isa:[weapon,military,firearm],examples:[m4,kalashnikov,g36]}
    "elephant" :{isa:[animal, quadruped]}
    "submersible"    :{isa:[vehicle, aquatic]}
    "table knife":{isa:[cutlery,tool,domestic]}               //!   knife, hunting knife, scalpel, carving knife, pen-knife?
    "fork":{isa:[cutlery,tool,domestic]}
    "destroyer"    :{isa:[vehicle, military, aquatic,metal]}
    "house" :{isa:[building]}
    "tower block" :{isa:[building]}
    "church"  @:{isa:[building,religious]}
    "shop" :{isa:[building]}
    "mouth (dog)":{isa:[bodypart]}
    "mouth (human)":{isa:[bodypart]}
    "eye (human)":{isa:[bodypart]}
    "castor wheel":{isa:[mechanical component]}
    "table":{isa:[furniture]}
    "chair":{isa:[furniture]}

questions here the metalables mix nouns and adjectives. is that ok? or should we be saying 'military entity' 'military object'.. or do we want yet another idea for adjectives.

if we did start doing markup this way, it might still be ok? ...because if we have a word thats an adjective we can just search replace it in the json?

I began to think about yet more structure... "isa", "adjectives:[..]", "surface materials:[furry,metalic,stone,brick,feathers,skin,leather,plastic,]" "purpose:[military,residential,commerce,law enforcement,industry]", "abilities:[flying,swiming,floating,ground motion,cutting,firing]" but a general graph could just have all of those as root words.. hmmm. "flying isa ability" "residential isa purpose" "furry" isa "surface material"

(I kind of like the idea that you could label something and just say "wooden" ("sorry I dont know what it is, but I can tell you it's made of wood - thats better than knowing nothing.."). Related to another game (with a practical purpose of user-hinted SIFT features..) , 'surface texture' labels could be useful (hence 'furry','feathered' in there). The idea of showing 16x16 pixel crops and asking 'what is this'. record a heat map of what people see. The areas that get the most accurate matches to the full object labels are prime candidates for visual words / SIFT features.)

Eventually I'm sure having the richest structure with a general graph will help ('destroyer' isa 'warship' isa 'military,ship'; 'ship' isa 'aquatic','vehicle' ..) but I think that's also going to need extra tools to manage (verify lack of cycles). 'X isa Y' ... if 'X' is something more specific than Y, then there will never be cycles in the graph.

I do keep running into ambiguity (which level to place something at .. label or meta label)

maybe I could write an offline tool to read the LabelMe database, manually organise as above, and spit out your json format (i.e. if we cover their labels with a structure that would be awesome.)

dobkeratops commented 6 years ago

right, I started messing with Go, and had a 'go' at tracing a 'label graph' from the input I describe. This is just a quick hack that uses literals in go source which mimics the proposed structure, and spits out an expanded out JSON equivalent with all the backlinks reciprocated (X isa Y also produces Y examples:[..X..]}) etc.

I could play around with that and see if it's possible to algorithmically pick a boundary between 'label' / 'meta-label' /'examples' (each leaf is an 'example', one back is labels, the rest 'meta'?) , but if we can just make this verify "no cycles", the need for that distinction goes away?

There's things yet to do like context (car wheel vs bicycle wheel), and tracing 'isa' to accumulate 'parts' (quadruped has parts head, leg,etc. dog,cat isa quadruped. dog, cat dont need to list 'parts:head, leg,tail..', they just inherit). But the graph is there to actually trace.

(I did have a brief look at go a while back but I dont know my way around it's ecosystem yet, and I'm still finding the best ways to do things in it. Let me know if you see any better ways of doing things in the language itself, eg. it seems you're supposed to write xs=append(xs,x), is there a decent workaround? write a helper function taking ptr-to-ptr.. is there some rationale why they dont have xs.append(x) out of the box..)

https://github.com/dobkeratops/label_list/blob/master/labelgraph.go

sample of generated output

"TV":{
    "isa":["consumer electronics"],
    "examples":["flatscreen TV","LCD TV","plasma TV","LED TV","OLED TV","curved TV","CRT TV"],
},
"tiles":{
    "isa":["surface material"],
},
"head":{
    "isa":["bodypart"],
    "has":["eye","ear","nose","mouth"],
    "part_of":["human","quadruped"]
},
"nuts":{
    "isa":["food"],
},
"webcam":{
    "isa":["computer perhipheral"],
},
"OLED TV":{
    "isa":["TV"],
},
"combat knife":{
    "isa":["weapon"],
},
"cauliflower":{
    "isa":["vegtable"],
},
"crop duster":{
    "isa":["agricultural equipment"],
},
"tank":{
    "isa":["vehicle","military object"],
    "has":["turret","gun","catepillar tracks"],
},
"dog":{
    "isa":["quadruped"],
},
bbernhard commented 6 years ago

here the metalables mix nouns and adjectives. is that ok? or should we be saying 'military entity' 'military object'.. or do we want yet another idea for adjectives.

really good question. I think it also depends how we want to operate on the dataset (i.e export data).

What I am really missing in most datasets is the ability to export only a fraction of the dataset I am interested in. Most of the time you have to download the whole dataset first and then manually collect all the data you are interested in.

I could imagine that it would really great if there is the possibility to export data based on a logical expression. Maybe something like this:

(dog & dog.mouth & dog.eye & dog.ear) | cat

this one is interesting as we could either do it like that: dog.color = 'brown'

or that: dog.brown

If we want to do it with a color attribute we probably also need to add that information somehow to the json representation and the database schema. If the second way is sufficient we could just add the color brown as a sublabel of dog.

How about police officer (man) , police officer (woman) just like we will have mouth (dog) mouth (human)

I like the fact that the gender is put in brackets. So if you start typing polic both strings would show up and the user could choose the most suitable one :)

maybe I could write an offline tool to read the LabelMe database, manually organise as above, and spit out your json format (i.e. if we cover their labels with a structure that would be awesome.)

That would be totally awesome. We could definitely learn a lot from the way the dataset is organized/structured.

It would be interesting to know what percentage of the LabelMe dataset was created by the LabelMe creators or power users (I could imagine that those parts are better structured/organized then the data that was contributed by occasional users).

Eventually I'm sure having the richest structure with a general graph will help ('destroyer' isa 'warship' isa 'military,ship'; 'ship' isa 'aquatic','vehicle' ..) but I think that's also going to need extra tools to manage (verify lack of cycles). 'X isa Y' ... if 'X' is something more specific than Y, then there will never be cycles in the graph.

totally agree.

I could play around with that and see if it's possible to algorithmically pick a boundary between 'label' / 'meta-label' /'examples' (each leaf is an 'example', one back is labels, the rest 'meta'?) , but if we can just make this verify "no cycles", the need for that distinction goes away?

I am not really sure. I have always seen metalabels as labels that add some information, but aren't really annotat-able.

e.q: Let's assume we have a picture where users have already annotated each appearance of a dog. In another refinement step we could serve the image with the annotation again to the user with the request to to describe each already annotated dog in more detail. Now the user has the ability to add more attributes/metalabels (or whatever we call them) to describe the appearance of the dog. e.q: short haired, mid-sized, brown hair, ... As we already know that the user is further describing the dog with this attributes we wouldn't need to let him annotate the area of interest again.

Related to another game (with a practical purpose of user-hinted SIFT features..) , 'surface texture' labels could be useful (hence 'furry','feathered' in there). The idea of showing 16x16 pixel crops and asking 'what is this'. record a heat map of what people see. The areas that get the most accurate matches to the full object labels are prime candidates for visual words / SIFT features.)

cool idea! :)

right, I started messing with Go, and had a 'go' at tracing a 'label graph' from the input I describe. This is just a quick hack that uses literals in go source which mimics the proposed structure, and spits out an expanded out JSON equivalent with all the backlinks reciprocated (X isa Y also produces Y examples:[..X..]}) etc.

that looks great!

(I did have a brief look at go a while back but I dont know my way around it's ecosystem yet, and I'm still finding the best ways to do things in it. Let me know if you see any better ways of doing things in the language itself, eg. it seems you're supposed to write xs=append(xs,x), is there a decent workaround? write a helper function taking ptr-to-ptr.. is there some rationale why they dont have xs.append(x) out of the box..)

yeah, Go is sometimes a little bit "different". ;-) I asked myself the same about the append method a while ago and found this blog post here [1] which makes the whole thing a little bit more logical.

It's really great that that you are investing some time in the label organization. Many thanks for that - really appreciated! :)

[1] https://criticalindirection.com/2016/02/17/slice-with-a-pinch-of-salt/

dobkeratops commented 6 years ago

I have always seen metalabels as labels that add some information, but aren't really annotat-able.

ahh fair enough. so 'Military' (not 'Military object') would actually be a prime candidate for a Metalabel. similarly 'urban','domestic','agricultural' etc. Ok that makes sense. So it's almost like the adjectives. Hence the 'weirdness' of saying soldier isa military .. (no) .. soldier relates to military (yes). actually 'soldier is military' kind of works. Seems it's just the specific word 'isA' being a bit too specific.

you could just throw things like color in there. Gender{examples[male, female]} . man {isa [male]}. 'Gender' itself becomes part of the metalabel graph.

I could still mark an object 'Military' though, if I spotted it's something soldiers have but I have no idea what it is. but 'Gender' makes no sense itself (just a means of finding the potential male/female/hermaphrodite labels).
An 'abstract' flag? (default 'no')

another type of link? or a flag on the word (and still use the same graph structure, as the metalabels could still exist in the same graph format). if y is metalabel print 'x relates to y' else print 'x isa y'.

I had a spurious whim to add bigger_than,smaller_than, thinking if you had enough description (like metalabels) you could forgo the 'description'. the 'description' of a dog is 'quadruped','pet','carnivorous','domesticated', etc.. All things that show up in the graph. the description of 'carbine' is 'everything that applies to rifle, but it's smaller, but still bigger than a pistol'..

then I figured instead of a simple struct (which, in Go without pointer-to-member was creating a lot of cut-paste) I might want general purpose graph links in the structure - e.g. an enum of directed edge-types. I guess that all be abstracted under a Label interface such that you make the query (iterate all of this type of link from this label) and it doesn't matter if the link was a struct pointer or part of the edge-graph

This could feature creep into a 'knowledge-graph' (semantic network) at which point it might be better to try and translate an existing one. (wasn't 'ImageNet' based on the existing 'WordNet'?)

yeah, Go is sometimes a little bit "different". ;-) mixed opinions about the language itself but I can certainly see the merit of it's simplicity - the friction in c++ can come from having so many ways of doing something that it creates difficulties between contributors, and I agree with the rationale of a GC being a good tradeoff for a UI heavy system like this. I definitely like the way you just 'bolt methods on' outside of any 'class' (I desperately want UFCS in c++ for that reason).

would be interesting to know what percentage of the LabelMe dataset was created by the LabelMe creators or power users

... There's an exponential falloff ; one of the larger contributors is indeed admin, but the bulk seems to come from a surge of paid labelling (I think ).

my hope comes back to the reason for wikipedia contribution: actually making your way through that can teach you something you didn't know. That's partly why I'm keen on a really rich label set: what if the system can introduce you to classifications you weren't familiar with (.. but then you can still use your human perception to match one example with another).
I also hope it could become of value as a artists reference (hence ideas like surface texture labels). An orientation label perhaps..('indicate the C-of-G, up & forward directions of this object / overlay an oriented cube').

But the biggest motivation for me is the need for a truly open labelled dataset - instead of AI being centrally controlled. That's a message we need to get across to people.

If you leave all the data & organisation to google/microsoft/apple/facebook (nothing wrong with them pioneering), they'll end up with control of pretty much the entire world (transport, agriculture,medicine..).

It's really the whole opensource idea, and with the far-reaching implications of AI this emerges from the relatively narrow domain of computing into every aspect of people's lives.

Ultimately if people dont get that,maybe there's no hope for them..

bbernhard commented 6 years ago

An 'abstract' flag? (default 'no')

like that one!

This could feature creep into a 'knowledge-graph' (semantic network)

I really like the idea, my only concern at the moment is, that I don't really know how well PostgreSQL is suited for modelling a graph like structure (with a deep hierarchy).

In the past, I was working on a project where I was using a dedicated graph database (neo4j) to model a graph like structure. While neo4j has a really cool and powerful way of traversing the graph, it's still a relatively new player in the database market and isn't as matured as other databases (like PostgreSQL). So I think it's better to stick with a database that's matured (like PostgreSQL), even if it might not be the best one when it comes to traversing graphs.

But I think we shouldn't worry about that too much for now. If we really run into some performance problems with our labels hierarchy, we can always rework the database schema or add some additional caching layers. :)

But the biggest motivation for me is the need for a truly open labelled dataset - instead of AI being centrally controlled. That's a message we need to get across to people.

If you leave all the data & organisation to google/microsoft/apple/facebook (nothing wrong with them pioneering), they'll end up with control of pretty much the entire world (transport, agriculture,medicine..).

It's really the whole opensource idea, and with the far-reaching implications of AI this emerges from the relatively narrow domain of computing into every aspect of people's lives.

my hope comes back to the reason for wikipedia contribution: actually making your way through that can teach you something you didn't know. That's partly why I'm keen on a really rich label set: what if the system can introduce you to classifications you weren't familiar with (.. but then you can still use your human perception to match one example with another). I also hope it could become of value as a artists reference (hence ideas like surface texture labels). An orientation label perhaps..('indicate the C-of-G, up & forward directions of this object / overlay an oriented cube').

totally agree :)

I am currently working on some query improvements which could speed up some operations. In parallel I am also trying to create a small prototype to see if the database schema would support that. (I am expecting that there are some changes needed in order to support that and as we currently do not have that much data in there it's probably a good idea to do the migration now).

I'll probably need a few more days for that, but maybe we can agree on the new structure for the labels JSON file in the meanwhile.

I think we don't necessarily need to be implement every aspect of the new structure in one go, but maybe can break down the implementation in smaller work packages and do incremental changes. If it helps us reaching our goal faster, we could also define some constraints (e.q attributes are not annotatable, but can only be used to refine an existing annotation) and remove them in another iteration.

edit Something what we also could consider adding are synonyms. e.q: Let's assume we have the labels police officer (man) and police officer (woman) defined. If someone types cop in the search field we could show them police officer (man) and police officer (woman) instead. Maybe it makes also sense to add words from the urban dictionary to the synonyms list?

dobkeratops commented 6 years ago

I'll probably need a few more days for that, but maybe we can agree on the new structure for the labels JSON file in the meanwhile.

would it be prohibitive to even just compile them in, 10,000 labels x 20 links(64bit ptr) each,64byte string=10.. 2.1mb? could be less if you linked with Ids.. (would the way the app runs on a server just load that in memory once and reuse the emebedded data per instance) ... and generalise it for configurability/further scale as it matures

I'm not too familiar with all the constraints (how apps are hosted on servers etc), was just taking read around google app engine etc.

If it helps us reaching our goal faster, we could also define some constraints i was starting to think there might be a way of a label saying 'this can use another label of for clarification, i.e. a pair of labels to save the DB holding 'red car' 'green car' .. 'man standing' 'man walking' etc (as I see a lot in LabelMe) . for example , an "auxiliary_label_of<...>" field (possibly itself an array, but we could start with one), where it specifies the root graph node of a series of attributes (examples - paint jobs for cars, postures for a human, etc). that would let those things to be specified and navigated in the same graph-like manner (just thinking about the permutations)

If I get around to reading the LabelMe data i could look at mapping that to something as I describe perhaps.

Something what we also could consider adding are synonyms

yes that's absolutely worth having IMO. the graph can be used as a search index, as such synonyms pointing at the system's preferred label would be great(.. thats where cycles might start happening accidentally .. "cop {isa "police officer"}" "police officer isa{"cop"} .. i wondered if i could just detect those cases and convert to synonyms

Maybe it makes also sense to add words from the urban dictionary to the synonyms list?

interesting idea (i.e. people's slang), another source is wikipedia's 'redirects' (they have a page covering 'rocks' , 'stone' is a redirect to it, etc) - as they've been through a process of people linking through synonyms

dobkeratops commented 6 years ago

could the database represent a graph by storing the edges (like a 'tripplestore')? records 'X isa Y' e.g. '{dog, isa,quadruped},{cat, isa,quadruped},{gecko,isa,quadruped},{dog,isa,mamal} ..' etc to find the graph of 'dog' , 'find all the records with {?,isa,dog} and {dog,isa,?}

bbernhard commented 6 years ago

would it be prohibitive to even just compile them in, 10,000 labels x 20 links(64bit ptr) each,64byte string=10.. 2.1mb? could be less if you linked with Ids.. (would the way the app runs on a server just load that in memory once and reuse the emebedded data per instance)

you are right, that could be worth a try :). But I still hope that we can use the database's power as long as possible, before there is a need to do things differently in order to improve performance.

i wondered if i could just detect those cases and convert to synonyms

that's a really cool idea and definitely worth trying out :)

What also would be really great is, if we have some sort of visual representation of the labels graph. I think it's not necessarily something we need to expose via web (although it would be really cool and enables other great things like clicking on a label and seeing all the images that are attached to that label) in a first step. But having a visual representation could help us catch cycles and get a better understanding of all the labels in the dataset.

I think Python has quite a few plotting libraries which produce nice results. So if we don't want to expose that via web in a first step, we could give Python a try.

could the database represent a graph by storing the edges (like a 'tripplestore')? records 'X isa Y' e.g. '{dog, isa,quadruped},{cat, isa,quadruped},{gecko,isa,quadruped},{dog,isa,mamal} ..' etc to find the graph of 'dog' , 'find all the records with {?,isa,dog} and {dog,isa,?}

Currently the schema of the labels table is pretty simple:

CREATE TABLE public.label
(
  id bigint NOT NULL DEFAULT nextval('name_id_seq'::regclass),
  name text,
  parent_id bigint,
  CONSTRAINT label_id_pkey PRIMARY KEY (id),
  CONSTRAINT label_parent_id_fkey FOREIGN KEY (parent_id)
      REFERENCES public.label (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION
)

So each entry has a parent_id which points it's parent entry. Up to now that worked great, but I think we have to change that in order to support the new, more powerful JSON representation. I did a quick research and found the ltree extension [1], which looks really promising and similiar to the approach you suggested.

interesting idea (i.e. people's slang), another source is wikipedia's 'redirects' (they have a page covering 'rocks' , 'stone' is a redirect to it, etc) - as they've been through a process of people linking through synonyms

that's an interesting idea - really like that. We could definitely give that a try!

btw: I am nearly finished with the quiz prototype. I hope that we can add the possibility to define quiz questions to the new JSON representation.

I am thinking about something like this:

 "dog": {
   "quiz": {[
      {
        "question": "What is the size of the dog?",
        "answers": ["big", "middle", "small"],
        "control": "radio"
      }
   ]}
}

The above example adds a quiz question to the label dog and makes it possible to refine the label dog further.

So whenever someone annotates a dog in a picture, the annotated dog popps up in the quiz tab and the user has the possibility to answer a question about it. If there are multiple dogs annotated in one picture, only one annotated dog at a time is shown in the quiz tab (to make it possible to refine each annotation individually).

The control field specifies how the answers are visually represented (e.q: radio for radio buttons, dropdown for a dropdown list and checkbox for checkboxes). If there are a lot of possible choices, a dropdown is probably a better choice than a lot of radio buttons.

quiz

What do you think about it?

[1] https://www.postgresql.org/docs/9.1/static/ltree.html

dobkeratops commented 6 years ago

But I still hope that we can use the database's power as long as possible, before there is a need to do things differently in order to improve performance.

I was thinking more as a dev-time shortcut i.e. its easier to just use literals/structs in a program etc rather than deal with interfacing to a DB (.. and moving to the DB engine would be a performance boost if it's better at handling complex queries on large structures). ..but I still need to investigate all the different ways of doing this more myself.

Obviously if you end up with a label creation UI, a DB would be much better, so perhaps it's future-proofing.

What also would be really great is, if we have some sort of visual representation of the labels graph. I

yes absolutely, even if it's just for debugging. It's nice to have a sense of how bunched/sparse it is, where you can improve with more 'sublabels'. I know of graph-viz, you mention python libraries, is there anything directly in go (that might make it easier to use it for UI later, if you go that way..)

Also one important use is splitting nets (e.g. instead of one net deciding 100 labels, you have one that decides 10 labels, which picks another of 10 refined nets that decides between another 10 labels.. i've heard people talk about this, does the technique have a name.. kind of a cross between NN's and Decision Trees, I guess.).. and for that it would be nice to have some visualisation of how it's split up. you could colour the graph or whatever.

like clicking on a label and seeing all the images that are attached to that label) in a first step

That sort of thing is very interesting IMO, it would just make the site more engaging to use. That actually interests me much more than gamification. So a visualised graph could indeed have end-user value beyond debug.

I hope that we can add the possibility to define quiz questions to the new JSON representation.

interesting idea, embedding sensible questions in the graph itself? It would make it read better. (what I envisaged is it's completely general ... kind of like the "miller-columns" browsing idea, any label will have subtypes and you just drill down as specifically as you can.. but if you can make it more deliberate and well presented - great)

r.e. total number of labels , I gather ImageNet is actually 1000, not 10000 - i.e. not at all hazardous to compile in - . Nonetheless, having potentially more would let you store multiple 'domain specific' datasets in one place (plant identification, calorie app, identifying obscure components..). The label graph in my current go experiment is at about 1600.

Still a little hazy on the 'metalabels' vs scenes etc but I stuffed an "abstract" flag in there, thinking you can use graph nodes for things like "surface pattern" (examples "striped","spotted" ..).

I still think there might be value to a separate label for the whole image (and this is the best label to start with for collecting many interesting photos), and we might want to say 'certain labels can only apply to the image'. but if you zoom in on a peice of dog fur, it's perfectly valid to say the image is "Dog".

bbernhard commented 6 years ago

yes absolutely, even if it's just for debugging. It's nice to have a sense of how bunched/sparse it is, where you can improve with more 'sublabels'. I know of graph-viz, you mention python libraries, is there anything directly in go (that might make it easier to use it for UI later, if you go that way..)

That sort of thing is very interesting IMO, it would just make the site more engaging to use. That actually interests me much more than gamification. So a visualised graph could indeed have end-user value beyond debug.

totally agree. I think there are a few golang data visualization libraries out there (it looks like even a GraphViz *dot file interface [1]), but I think there are more powerful javascript & python libraries out there. Once we have a labels representation in place, I think it shouldn't be that hard to use d3 (https://d3js.org/) or any other javascript library to expose the labels structure via Web.

btw: I really like this type of visual representation: http://www.dkriesel.com/static/spon5_graph_keywords/ It's from David Kriesel. He analyzed all Spiegel Online (german newspress) articles and did some analyzes on them. One of the things he did was, he extracted all the keywords/tags from each article and placed them in a visual map. Keywords/tags that are often used in the same article (e.q "US election" and "Trump" are placed within just a few pixels whereas the distance between other keywords (like "US election" and "tennis") is a lot bigger.

I still think there might be value to a separate label for the whole image (and this is the best label to start with for collecting many interesting photos), and we might want to say 'certain labels can only apply to the image'. but if you zoom in on a peice of dog fur, it's perfectly valid to say the image is "Dog".

totally agree :)

r.e. total number of labels , I gather ImageNet is actually 1000, not 10000 - i.e. not at all hazardous to compile in - . Nonetheless, having potentially more would let you store multiple 'domain specific' datasets in one place (plant identification, calorie app, identifying obscure components..). The label graph in my current go experiment is at about 1600.

Definitely!

1600? Wow, that's really impressive! I wonder how the visual representation of the label landscape looks like.

interesting idea, embedding sensible questions in the graph itself? It would make it read better. (what I envisaged is it's completely general ... kind of like the "miller-columns" browsing idea, any label will have subtypes and you just drill down as specifically as you can.. but if you can make it more deliberate and well presented - great)

jep, right. Miller columns are also a great way to explore data. I think if we find a good label hierarchy and which we can query fast, then we can create all kind of cool stuff for visualization.

[1] https://github.com/awalterschulze/gographviz