Important labels -picture,mural,toy,model,graffiti,photograph,poster,painting,drawing,,cardboard cutout,billboard,reflection,...

ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.

https://imagemonkey.io

47 stars 10 forks source link

Important labels -picture,mural,toy,model,graffiti,photograph,poster,painting,drawing,,cardboard cutout,billboard,reflection,... #248

Open dobkeratops opened 5 years ago

dobkeratops commented 5 years ago

This is NOT. Fish.. 8714DC61-A905-4653-9A96-A3314D041DDD E3413462-0DE4-4F18-BE5C-9CC1B1AF4B13 None of there’s are Rabbit... 02D3240D-3688-4534-9659-4C45C20AB6B5 2CF12296-D295-4252-84BA-2AC084D271EC

It is extremely important for a vision system to distinguish ‘images’,’fakes’ from the real thing, eg there were some pathological examples of pictures of people and cars on the side of trucks which cause confusion for self driving car vision systems.. and there’s the obvious example of face recognition unlocking which you don’t want to be fooled by a printed image) I guess in training we also want objects to give scale hints, and vica versa (the scale of an environment could tell you if a very realist is model is really just a model or the real thing..). (That’s where different labels for trees and bushes were a very useful addition, they often look very similar but are just a different size)

The current dataset contains a few examples of things like toy cars, cuddly toy rabbits etc.. without these labels (toy, model, poster,photograph etc..) there is a risk that people will label them as the “real thing” - infact many of these examples are infact already tagged with those labels, and that’s really dangerous

The first priority would be to at least get labels “toy” etc, but maybe there’d be a solution for a general way in “toy ”, “picture of ” etc.

As I write this.. I wonder if your plan is for this to be a property? (Toy vs real, default is ..?) but I think that’s quite dangerous.. people might not go back. I think it would be safer to at least have literal labels like “toy” at least. (Better to mark that it’s a toy, type unspecified, rather than it’s a car, and not specify if it’s real or a toy..).

Arbitrary label blend or prefixes might help? .. in principle you could have a toy,model,picture etc of absolutely anything. (Even a scene label .. “picture of airport..” etc)

Statue , sculpture, monument might fit this idea too.... “statue ” “statue ”

dobkeratops commented 5 years ago

(Some more labels for that “figurine”, “ornament”,”doll”,”action figure”,”minifig” (minifig is the collective term introduced for the hugely popular Lego people). Also “Gargoyle”,”decoration”, “carving”

There’s a few unusual examples of labelled as “ant” which are really figurines or ornaments I guess. I have this example of an ornament/decoration on a door which is a lions head.

. I just submitted a few images from a museum, lots of examples of figurines,ornaments. (Which also reminds me to ask for a scene label: “museum”, and “display case”. Museums seem a useful way to get a wide variety of objects..j)

Notice how quickly the label list grows.. that’s nearly 20 examples of ‘fake things’. Each description carries contextual hints.. it’s slightly different, and if you don’t have all, you won’t be able to mark the right description... it’s not accurate to call a small 3D fake a ‘sculpture’ or ‘statue’ or ‘toy’ when it s really a ‘figurine’ and so on... if you don’t have all these labels I fear people will make awkward aproximations. If you wanted to cover all of these with some broad terms they would be unusual technical descriptions, but It might be nice to have these as these as graph nodes:-

2d fake/representation of object -> {painting,drawing,printed photograph,...} 
3D fake/representation of object->{ sculpture,model,toy,  representation of figure->{figurine,action figure,statue,minifig,cuddly toy,sculpture/figure,carving/figure}}

This Reminds me of the problem with cars,etc... there isn’t actually a good single term to cover the broad category “rickshaw,car,van,truck,bus,coach,pickup truck,minivan,minibus,..”. .. seems like “enclosed powered road vehicle with at least 4 wheels” should be a graph node .. the term “vehicle” isn’t quite enough because it includes aircraft, boats, motorbikes, and even bicycles. If you just want one label, it going to be something awkward that people will have trouble finding

bbernhard commented 5 years ago

None of there’s are Rabbit...

Not sure if it's as easy as that. I guess it often depends whom you ask. If you ask a little child what that is I am pretty sure it would say "that's a rabbit". And only if you would ask more detailed questions it would finally say that it's not a real rabbit, but a rabbit toy figure. But of course, strictly speaking that's not a rabbit.

As I write this.. I wonder if your plan is for this to be a property? (Toy vs real, default is ..?) but I think that’s quite dangerous.. people might not go back. I think it would be safer to at least have literal labels like “toy” at least. (Better to mark that it’s a toy, type unspecified, rather than it’s a car, and not specify if it’s real or a toy..).

You are right, that could indeed be dangerous, although I think that the properties approach would be the right one (at least from the organizational/structuring point of view). I think the underlying problem is, that people tend to be lazy when it comes to describing things...to be honest here, the first label that came to my mind was also "rabbit". So, even if there would have been a "toy" label, I am not sure whether that would have been my first choice. I think I would have reacted like a child and said "that's a rabbit!!" ;-)

What we could easily do, is to introduce a bunch of metalabels/scene labels for all the labels that are not bound to a specific polygon, but rather apply to the whole image. e.g: cartoon, black/white, fisheye lens (e.g to tag your street images)...etc.

The current dataset contains a few examples of things like toy cars, cuddly toy rabbits etc.. without these labels (toy, model, poster,photograph etc..) there is a risk that people will label them as the “real thing” - infact many of these examples are infact already tagged with those labels, and that’s really dangerous

I guess the question is: Are those labels really wrong, or are they just missing additional information (like "that's a toy"). If it's the latter, then it's bad in the sense that we have to go over all the images to fix that, but at least it's fixable. (e.g we could query the dataset for all rabbit images that are not a toy and add the missing toy property in case it was a toy). Another question is: If we want that users tag those type of images with toy, how can we make sure they will really use the toy label for that? (My gut feeling is, that people will first think of a rabbit, see the child analogy above)

dobkeratops commented 5 years ago

To my mind: it’s really clear those are just Wrong. There’s a really important issue with scale hints and context. A child might tell you that’s a rabbit, but if your open source robot/personal assistant also thinks it’s a rabbit... that’s a massive failure, unfortunately.

If you’re relying on a qualifier to distinguish that it’s a toy .. you kind if can’t rely on the existing car labels to actually be cars until someone has gone back over all of them.

People may well be lazy describing things, but in an image dataset we need precision/accuracy. Precision can also mean giving an approximate answer where the level of aproximation is known ... eg if you see some water and can’t tell if it’s a lake,river or sea because it’s zoomed in.. then the more precise answer is ironically “water” or “lake/river/sea” - those terms have less potential error compared to the range of possibilities.

If you give a more specific term that is incorrect.. it’s a bigger error in the data. I would think if this like specifying quantities with tolerances. “Toy” is a better aproximation than ‘rabbit’ for a ToyRabbit, because a rabbit is not a toy, but a toy rabbit is. You’ve basically broken the set boundaries if you start with rabbit then refine with a property ... unless you’re going to decouple those later. Basically every label would include in its set “toys /models of itself .. “ as well as the real thing

how can we make sure they will really use the toy label for that? (My gut feeling is, that people will first think of a rabbit, see the child analogy above)

The easiest way to my mind would be to allow toy etc as prefixes, then in one ‘breath’ you could say naturally: “toy car”, “picture of fish”, etc. You might be able to detect this sort of thing through overlap , eg if someone outlines that as a fish and another person outlines it as a mural ... but the fuzziness if boundaries would worry me. I’d really prefer that people are guided toward giving accurate,specific descriptions .

Perhaps being encouraged to actually use a prefix will give more hints. “It’s a car,,”. “What kind of car?” (Autocomplete suggestions list various options)

... the second moment where you might pick ‘hatchback’,’saloon’ etc is a moment of reflection giving you the opportunity to correct yourself and say ‘ah, it’s a toy..’.

To put it another way.. When encouraged to give one word, neither toy nor car might be satisfactory, hence the temptation to say “car” as you correctly identify a child might say

But if you MUST give 2 words, ‘toy car’ could be more obvious than ‘car, hatchback, ...’ Eg User 1 thinks “toy, car, hatchback,red..”. User 2 might think “car, toy,red, hatchback” .. if you capture a bit more of the description stream from their head there is more chance of getting it right

Once your hands are on the keyboard it’s not much hassle to give 2 words instead of one; the most significant qualifier(property) will add a lot of value, I suspect. I find myself always wanting to tell it “metal gate”, “brick building” etc, “wild grass” etc. If you could do that in one step it would be awesome

dobkeratops commented 5 years ago

Oops accidentally closeed

bbernhard commented 5 years ago

totally agreed, for a dataset precision is key.

But I think no matter what we do, (the broad mass of) people will always do what comes natural to them. In some cases their decision for specific label choice might be even based on their use case.

e.g Imagine someone wants to train a image classifier for a child's game. For that use case it's probably irrelevant whether the image shows a real rabbit or a toy rabbit. So, if that person will use ImageMonkey to label and annotate it's images he will most probably do that with his specific use case in mind.

To me, that's one of the core problems/challenges of an open dataset. There are a lot of people, each one with a different background when it comes to properly labeling a dataset and all of them with (slightly) different use cases.

I think this will become an even greater challenge once we allow free labeling+annotation. At the moment we (kind of) know what's going on in the dataset (at least we know all the "stable"/productive labels in the dataset. But that will most probably change as soon as we give up that safety net.

At that point it will be even harder to find wrongly labeled images. There will be many more (slightly different) label combinations, probably a few misspelled labels and (hopefully) only few spam.

I really hope that moderation and validation can help here to keep the structure of the dataset intact (at least to a certwin degree) But it's always hard to estimate what will happen once we hit "mainstream".

You are right, prefixing labels could really help here. Nevertheless there is still the problem that people need to know how to properly label stuff (so that it can be used for different use cases). I think that's the biggest problem here. People will only try to solve their use case without looking at the bigger picture.

So if it's irrelevant for their use case whether it's a wooden fence or a metal fence, they most probably will label it fence. I was hoping that the properties system could solve that to a certain degree. If we keep the labels "basic", users can easily add additional properties to it, and extend the dataset so that it also fulfills their use case (e.g classifier for toy rabbits)

dobkeratops commented 5 years ago

In some cases their decision for specific label choice might be even based on their use case.

True, but I personally believe there is value in a universal ‘master’ visual database .. because use cases overlap. You can walk down a street and enter a school or garage, or walk past a park with a playground, and have a sequence of images that will span several domains ... you should be able to look at those images side by side and give unambiguous universal descriptions. A home has a garden which will have overlapping labels with agriculture, or objects in homes which will overlap with workplaces;

One suggestion I have for free labelling is to require that they’re a specialisation of a broad (existing,curated)categoryy (or label)eg you would always have to say “pet->pooch”, “building->hangar”, “tool->pickaxe”. At least still the moment of having written the broad label autocomplete could show you a narrowed set of refinements.

I hope that the open ended graph would help you with this, “tool” -> {“hand tool” , “power tool”, } .. whatever description you started with the graph could give you hints to find the most specific existing label , and then you know for sure if you need to go further with a free label.

My suggestion would be to add some very broad additional labels (things like container, tool,vehicle,animal etc) such that you’ve got a ‘complete’ base vocabulary that can say something about anything. (“Animal,vegetable,mineral” is an example of a 3 word vocsbulary that is at least complete, although it isn’t very descriptive)

Example images is the other idea that could help, regarding hints.

And also “thematic context”.. a bit like the scene labels (there’s a blur in my head between general purpose tagging, scene labels, and ‘thematic labels’).

Eg.The focussed images on a plate: I’ve been adding the ‘tag’ “food and drink”. The focussed closeups on a plant: “botany”. Pictures of malls, shopping bags, shop counters: “retail”. Pictures of soldiers, tanks , jet fighters, warships:”military”. Art,craft would be other good themes. You might find that with the right ‘broad theme labels’ it’s easier to guess the right thing. The toy rabbit? What if other pictures of animals require a qualifier: “livestock”,”wildlife”,”pets”. Other ideas for thematic qualifiers. “Toys/games”, “amusements”,”entertainment”, “education”, “street art”, “industry”,”academia”,”transport”,”recreation”,”sport”,”excercise”.. etc etc.

I wonder if it would again be possible to have a complete range of such ‘themstic/scene labels’, which let you say something about any image, further hinting what is in there (good for narrowing common label suggestions ?)

bbernhard commented 5 years ago

True, but I personally believe there is value in a universal ‘master’ visual database .. because use cases overlap. You can walk down a street and enter a school or garage, or walk past a park with a playground, and have a sequence of images that will span several domains ... you should be able to look at those images side by side and give unambiguous universal descriptions. A home has a garden which will have overlapping labels with agriculture, or objects in rooms which will overlap with workplaces;

totally agreed. I think what's important is, that we somehow "force" users in the right direction. That can either be done with some clever UI design hacks (e.g as you already mentioned showing some "label examples"), with a good FAQ/best practices section or some additional validation/moderation. I think most of the stuff is (meanwhile) pretty clear to us, as we've talked a lot about pros and cons of various decisions, but the consequences

My suggestion would be to add some very broad additional labels (things like container, tool,vehicle,animal etc) such that you’ve got a ‘complete’ base vocabulary that can say something about anything. (“Animal,vegetable,mineral” is an example of a 3 word vocsbulary that is at least complete, although it isn’t very descriptive)

sounds good to me. (I've tried to increase the frequency with which I make labels productive a bit in the past few weeks in order to close the gap a bit; hopefully that helps)

Another thought: What if we try to parse the label and split it up in base label + properties? If I remember it correctly, then you've mentioned that idea in another ticket a while ago.

e.g: let's suppose we create a hardcoded list of common properties:

material: metal, wood(en), iron, ceramic, acrylic, carbon, bronze, cement, gold(en), silver, ..
color: red, white, black, blue, green, violet..
size: big, small, medium, large
weight: heavy, light
other properties: toy, ..
...

If someone now adds the label small red toy car we could split that label up into the actual label (car) and it's properties (red, toy, small). In the unified mode labels list the full (i.e small red toy car) label would be shown. If you now fulfill the annotation task by drawing a rectangle around the "small red toy car" it would automatically attach the properties red, toy, small to that rectangle. (so you would immediately see the properties show up on the properties list on the right side in the unified mode)

That also means, that we won't store the actual label "small red toy car" as a string in the database (because that would be painful to parse when someone wants to query the database for the substring "red toy car"), but rather store it in the split up version (i.e: label: car; properties: red, toy, small) which is easier to query.

What I personally like about that approach is, that it allows us to store the labels + properties in a way so that we can query those again in a performant way. What do you think about that? Would that be a useful improvement or isn't that worth the effort?

dobkeratops commented 5 years ago

Being able to parse properties from a label string (“small toy red car”) would be perfect, but might be difficult because of natural language issues (eg ‘glass’ being both an object and material property, ‘orange’ being a color property and type of fruit..) - I figured it might be ok to parse 1 prefix?

Do you need a table of viable properties somewhere .. eg ‘gate’ -> which material prefixes make sense (metal, wooden.. not plastic, cloth, wicker..). That might let you generate safely parseable examples?

I wondered if graph nodes could function as documenting viable combinations, but that would only be practical for one prefix. (Could we do one prefix through a label, then more through clickable properties or explicit syntax?)

Maybe you can look into delegating a natural language parsing library or service (text labelling in some parallel project?)

Maybe there’s some markup we could use ,i think you had this idea of explicit “color=red”,”size=small” etc.. that might be doable? I suppose there could be a seperate tool or mode for ensuring it parses the text ok (“small toy red car”.. “Can you verify this means. Color=red, size=small, label=car”)

I also wondered if ‘thought vectors’ or ‘word embeddings’ would be useful (eg labels are turned into multidimensional vectors given by a precomputed word embedding) but I’m not sure how well those handle parsing sequences of words. “Speed bump” is a type of road feature, “speed” and “bump” are 2 separate words with independent meanings .. i’m not sure that technique can handle it

It is certainly not a trivial problem.

Could ‘slash’ combination handle the picture, toy cases? .. “toy/car”, “mural/fish”, ”statue/horse” .. there’s other places where i think arbitrary combining could be useful (soil/grass .. many mixed surface types. Strange vehicles “boat/aircraft”, “boat/truck” .. they do exist :) there’s also a weird piece of furniture - combination bench/table , or seat/table, i think i;’ve seen such a thing as a “desk/bed”, “bed/sofa” ..

bbernhard commented 5 years ago

Being able to parse properties from a label string (“small toy red car”) would be perfect, but might be difficult because of natural language issues (eg ‘glass’ being both an object and material property, ‘orange’ being a color property and type of fruit..) - I figured it might be ok to parse 1 prefix?

You are right, there are indeed some combinations where it gets tricky. But I think as long as we have a static labels list + a static properties list, I think it should be doable (with some restrictions). e.g: I think a sensible requirement (at least for the first iteration of the parser) would be that the label string consists of exactly one label (and n properties). I think with that restriction most of the edge cases should be re-solvable.

But right, If we allow free labeling this will become way more difficult. But maybe (that's really a vague maybe ;)) the free labeling won't be that important anymore if we have 1-2k base labels + a few hundred properties. Because then you could create a lot of different label combinations.

Another advantage of a static label + properties list is, that we could try to check for natural language parsing clashes upfront (it probably doesn't scale if we try every possible combination to look for parsing clashes, but maybe there are some algorithms that could help us here a bit).

Could ‘slash’ combination handle the picture, toy cases? .. “toy/car”, “mural/fish”, ”statue/horse” .. there’s other places where i think arbitrary combining could be useful (soil/grass .. many mixed surface types. Strange vehicles “boat/aircraft”, “boat/truck” .. they do exist :) there’s also a weird piece of furniture - combination bench/table , or seat/table, i think i;’ve seen such a thing as a “desk/bed”, “bed/sofa” ..

I think we are currently using the slash operator to represent "has"/"part" relationships. e.g: eye/dog. I guess in case of soil/grass it would be more a label blending/combining, no?

But yeah, in case of the toy rabbit, it could indeed make sense to look at it like that. One could definitely say that it's a combination of toy + rabbit.

I wondered if graph nodes could function as documenting viable combinations, but that would only be practical for one prefix. (Could we do one prefix through a label, then more through clickable properties or explicit syntax?)

that's a interesting thought..have to think a bit more about that.

dobkeratops commented 5 years ago

I think we are currently using the slash operator to represent "has"/"part" relationships. e.g: eye/dog. I guess in case of soil/grass it would be more a label blending/combining, no?

Right, What I had hoped is: (i) “/“ could be redefined to ‘context sensitive blend’ - (ii) some words always ‘blend’ as a part (“head”,”tail”,”leg”,”arm”,”wheel”,”foot”,”hand”,”door”,”window”) but these could also be listed as objects (there are certainly ‘wheels’, ‘doors’ visible as seperate objects in workshops, construction sites, shops). Not sure what to do if you say “arm/leg” though, (lol) (iii) the default could be ‘just mix the meaning’ (iii) you could then use other symbols & | etc to force ‘simultaneous meaning’,’an area containing either..’ , and maybe yet another explicit syntax to force ‘part of..’ . (iv) if any combinations cause trouble, could there be an explicit table of over-rides?

If there was a way of explicitly making hierarchy (master object= ‘car’, component objects =‘wheel’,’headlight’ etc etc..) that would be a surer way, but that’s a lot more UI to figure out.

dobkeratops commented 5 years ago

Unusual combination objects.. (a case for arbitrary label blend?) Desk/bed .. actually even desk/sofa/bed . I suppose you could just label the pieces individually (there’s also a ladder), but unusually here they’re components of a single combined peice of furniture. 50AA0DD1-851A-4E8E-82F1-4529D215DAB3 Helicopter/car EACF9284-A08B-4161-8564-F14E2BC7C94A Aircraft/boat ? (Soviet ekranoplans.. ground effect .. I guess if you had words like hydrofoil and hovercraft aswell you could get a closer blend description. The overlap of boat and aircraft might cover it because it’s “a vehicles ,but not a land vehicle”. It’s definitely not an aircraft , boat , hydrofoil, or hovercraft though. And ok I admit we don’t have many of these around :) 00A9708E-4151-4B26-9A2C-1C32AE3B40EF Boat/truck 57DEE292-E77F-4D66-A910-CEB6FA8385F4

I think you could reasonably describe this as a “table/bench” .. it’s actually got an official name: “picnic table”. Again it’s interesting that you’ve got components (more specifically tabletop and seat) of a single combined furniture object. Tabletop and seat components could be useful generally... there’s a few places in vehicles where it’s better to say it has a seat component (a chair is an independently moveable seat with backrest, a seat is just the part you sit on, which could also be part of another structure) 785AB0DD-4E86-4EB4-950B-E2AE2DCE7B8B

dobkeratops commented 5 years ago

Pathological vehicle /picture/advertising. Images.. it’s absolutely 100% vital that the image set does not confuse these images with the real thing if this is going to be used for any kind of self driving vehicle training; I’d say the same for other kinds of future robot applications, and the kind of assist with image comprehension personal assistant apps or art apps might need. Theres a more famous example of a picture of a perspective road on the back of a van ..

Definitely not a person:- fatal confusion if you’re relying on scale hints, places where an off-road pavement delivery bot can go, etc.. 37190942-C8A8-46D7-BD6C-E6D8DD498D47 Hah painting of a picture frame containing a painting 14643139-9A4D-4A01-BF30-A6AB4F6C7D32 More “not person” EC3BB360-3C18-4024-A689-BE7CFFC9A10A 2FAB7DB8-8780-4E8A-96C8-F4E8871B25AB EA03DD51-FBC5-4C8E-ABEC-1D1075A21165 ![Uploading EBF3290B-E40B-4B62-B02C-0CA4ADB22E41.jpeg…]() ![Uploading E49694EB-6AF5-4644-A7EC-F2470BD58C5D.png…]() ![Uploading 13F5636E-A76D-4192-9062-7CCCB4C41503.gif…]() Picture of s truck on s truck.. again could be really confusing regarding scale, “where is the road”, etc.. ![Uploading A48E00C9-1E47-4641-909D-9971D795FB9F.jpeg…]()

bbernhard commented 5 years ago

totally agreed. We definitely need to find a way to solve that. I think in general the system is expressive enough to deal with that type of problem. The question is, how should the ideal solution look like?

At the moment I would see two different options. Either

a) use the properties system b) introduce label blending

At the moment the properties system would already exist..so if we base our decision on the fact what's the easiest to implement or "what's already there" we would have a clear winner. But not sure if that's best indicator for making good decisions..

) Is any of those options superior to the other? ) Let's assume for now that we introduce another concept, the label blending. What's the difference between label blending and the properties concept? Can we formulate rules when to use one and when the other? If we can't find a clear distinction between those two things I personally would go with the unix philosophy "everything is a file..aaehm property".

I always find it really frustrating when a piece of software offers multiple ways of accomplishing stuff without explaining to me what the advantages/disadvantages/implications are. If there are clear rules (i.e "use this when....use that when....) I feel way more confident that I am doing it the right way. Without any clear rules I always have this doubts that I am doing something wrong. And as a little perfectionist that's really frustrating.

What I am bit afraid of, is that we introduce a lot of different concepts, each of them makes perfectly senseon it's own. But if you look at them alltogether it gets difficult to hold them apart. Is this now a label, a blending label, a property, ...etc. Especially as a new user I guess that could be a frustrating experience.

What I am hoping is, that we can find a universal representation that is expressive enough so that it works for all cases. e.g If we manage to represent all that with properties, we could add another abstraction on top of that (e.g the slash operator). The slash operator would then just be a shortcut on top of the properties system. So, no matter if you use the slash operator or a normal property, the end result would be the same. That way, we don't have to explain the user the difference between the slash operator and the properties system. As everything is a property you can do everything with properties. The slash operator is just a convenience tool that makes it a bit easier.

Does that make sense?

dobkeratops commented 5 years ago

Kim kind of hoping they can be made equivalent , and you just enable whatever aliases or UI makes it easy to use.

“Everything is a property..” even the label ultimately? Could you consider a label as a shortcut for a bunch of properties like “carnivorous”, “alive/dead”, “passenger carrier”, “cargo carrier”, “self moving” “flying” etc etc

I like the idea of “word embedding”where they try to compute a meaningful large vector (eg 500d) for each word from text. Under that idea, there’s effective properties in that vector space (they can do things like “king +woman-man = queen” ie.e, there’s a gender axis somewhere.

Between the graph and properties.. I think you’re effectively trying to make a hand labelled space for the words. You could see the properties as a way of generating graph nodes, or the graph as a way of encoding properties. (“Flying”,”vehicle”,”carries passengers” nodes all pointing to “jet airliner”)

dobkeratops commented 5 years ago

The slash operator would then just be a shortcut on top of the properties system. So, no matter if you use the slash operator or a normal property, the end result would be the same.

Right that’s the ideal You could imagine just layering information. If you even had labels based on material(for objects you can’t actually name ) that would be useful.. but imagine if you could retroactively add the missing part in a property. The 2-part property information (“material=glass”, “posture=sitting” etc) could be seen as disambiguation of the property words themselves? (“glass (material)” vs “glass (dish ware)”)

You could have binary properties (“living=true|false” “man made=true/false” etc)? Or for those would you prefer to say “origin=natural|man made”, ...=living/inert..”. ... some of these properties dont have 2 parts. ‘Carnivorous”, “omnivore”, “herbivore” ..

bbernhard commented 5 years ago

that would indeed be awesome. I think the most important thing is to get the foundation right. If we find a format that is expressive enough we can build simplifications on top of that.

At the moment I am playing around a bit with the ntlk (natural language toolkit) python library. Personally I like the idea of having a semantic parser (something that can parse "small red car" or "brown haired dog"), but I am not sure how powerful the natural language processing libraries already are.

I am quite condident that we could write a parser based on static word lists (i.e static properties + static labels list). With completely dynamic labels + properties I am not sure however. I think that might produce quite a lot of parsing errors (e.g labels are accidentally parsed as properties and vica verca).

I kind of like the idea, that one can draw a rectangle around an object and then a popup opens where one can enter the label + properties e.g "small brown haired dog".

But for something like that to work, I think we would need to stick to static labels + properties lists. Otherwise we probably end up with chaos (but maybe my opinion changes when I gain more experience with nltk)

dobkeratops commented 5 years ago

To throw another idea out there, Imagine a ‘something of something’ format.l Eg Pile of... stack of ... row of.... picture of... model of... photograph of ,.. painting of...

“Row of [tree]”. “Stack of [pallet]”. “Pile of [bricks]”.
“Picture of [fish]”

Some of those might sound very odd in natural language terms (“toy car” vs “toy of [car]”)

Could you consider some aliases for the common cases.... “toy car”, “toy vehicle” “toy animal” “Picture of person” “picture of plant” “picture of animal” “Picture of vehicle” “Picture of building” .. (very common on advertising billboards) “cuddly toy” (better aproximation for the cuddly toy rabbits than just the word toy?)

bbernhard commented 5 years ago

like that!

Although I think the problem stays the same: In order to parse the semantics we probably need static lists. Otherwise it will probably be too complicated to reliably parse all the edge cases.

With static lists we could parse all kind of different information from the label.

e.g:

blueberries == blueberry + plural small red old cars == car + small, red, old, plural picture of fish painting of vintage wind mill

But I think this only works if we keep the lists static...

dobkeratops commented 5 years ago

Another subtle example of a “fake/image”: theatrical mask and possibly face mask , also costume (There’d be things like costume/gorilla, etc) ... of course there are very different types of face mask too... some face masks are functional,protective. Some are jokes, amusements (masks of Star Wars characters, etc).. there’d also be a sliding scale between theatrical mask and the other types. I’d recommend the label ‘’’mask’’’ as a catch all too. Unfortunately the term mask has another meaning eg stencil. ‘’respirator’’’ would be a more accurate term for some cases.. (Wikipedia: mask (disambiguation) ) fancy dress would be another good catch all term for certain types of costume

Again vital that you can distinguish this is not a face, person.. because that type of fake face could be on a puppet...

BAE488E4-8975-4844-BDF0-FB5CF20C27ED

dobkeratops commented 5 years ago

... which reminds me of another case to add: puppet ... another image of a person.. not covered by toy , model, figurine. 550EE89E-3149-4EC0-B4DE-14A2E4DE1C40 subtype glove puppet (maybe another case where arbitrary label blend could help.. ‘glove/puppet’’’ Also add closely related animatronic , maybe more likely in the modern world ... even robot, humanoid robot wheeled robot robot arm so the devices we’re training can recognise themselves in the mirror :) Also prosthetic prosthetic arm prosthetic leg

‘Mirror’ reminds me to request “reflection of ... “ and maybe even “shadow of...” distinguishing shadows from silhouettes could be useful

bbernhard commented 5 years ago

Thanks a lot for all the examples! I was thinking a lot about the problem scope lately and those examples really help to get a better understanding :)

After playing a bit more with ntlk and reading a bit of documentation I have more and more the feeling that the automatic label/properties parsing will only reliable work if we stick to static lists.

The main question for me is now: Is that a restriction we can live with? Or in other words: If we focus all our energy in creating a static labels/metalabels + properties list, can we come up with label combinations that cover (almost) all of our needs? (I think it's not necessary to have all edge cases covered, as we still have the trending labels concept. But I think we should have at least 80-90% covered, otherwise it won't be a pleasant user experience.). I think by combining labels with multiple properties (and other labels) we can create quite complex label combinations. But not sure if this is enough? Or is free labeling the only choice we've got?

dobkeratops commented 5 years ago

). I think by combining labels with multiple properties (and other labels) we can create quite complex label combinations. But not sure if this is enough? Or is free labeling the only choice we've got?

Free labelling is the only way to be sure, but Label combination and a predetermined list might work.. if the predetermined list was comprehensive enough. I’m guessing it’ll be better once we reach thousands of labels

I wonder if a pipeline of label additions sorted by domain could help - eg classify the images by scene label, then ‘unlock’ a scene by adding all the object labels for it .. a search mode could prioritise images whose scenes have been dealt with

My hope is you could grow the label list much faster if you focus on domains, rather than doing randomly through trending: it’s like the issue of cache coherency, with your attention being a cache. Focus on ‘aquatic vehicles’ and you could hammer out a list of 20 different types of aquatic vehicle (and sort in graph) pretty quickly.. one week do ships, the next week do car types, the next do vegetables, etc..

dobkeratops commented 5 years ago

... so what I imagined without free labelling was a “pipeline” concept.. we’d gradually unlock domains, and prioritise serving images from the domains whose labels were more comprehensive .. choosing which domains to do next in a way that kept the variety going

Some kind of “request” process might help, eg draw around a key example object and either assign a question mark, or a free label (requesting to add it to the database). Imagine if you could place a request with several alternative descriptions , in the case where you’re not sure ([“pooch”, “pet dog”]),[ “cargo carrier/ship”,”freighter”]

dobkeratops commented 5 years ago

I guess that natural language parsing is itself an ongoing challenge, itself probably the subject of labelling websites.

I wonder if it would be possible to decouple labelling from label comprehension... but I get the impression you do actually want to train a model along with the site (fair enough).

I talked to a few other people elsewhere with an interest in language parsing but not sure if they wanted to connect here.

I had wondered about getting a dump of the LabelMe labels ... perhaps going back over all of those and “labelling the labels” (ie building a graph for them..) would be a useful community excercise..

bbernhard commented 5 years ago

... so what I imagined without free labelling was a “pipeline” concept.. we’d gradually unlock domains, and prioritise serving images from the domains whose labels were more comprehensive .. choosing which domains to do next in a way that kept the variety going

Some kind of “request” process might help, eg draw around a key example object and either assign a question mark, or a free label (requesting to add it to the database). Imagine if you could place a request with several alternative descriptions , in the case where you’re not sure ([“pooch”, “pet dog”]),[ “cargo carrier/ship”,”freighter”]

like that! How would you deal with not-already-known labels? e.g someone adds this label: dirty old red car

With our new parser we try to parse the label, but fail as we do not already know dirty. So we put that label into a queue (I imagine something similar to the trending labels github repo here), so that a human can have a look later and see why the automatic parsing wasn't possible.

The admin/moderator now sees that dirty is unknown and adds the label property to the database. After that, the label dirty old red car will be automatically split up in:

label: car properties: dirty, old, red

I think in general such a supervised label parsing system would work. Of course, there will be edge cases, that will pop up, but hope that most of them can be fixed over time by gradually improving the parser.

But as the whole process is asynchronous (someone will add a label that cannot be parsed automatically and (hopefully) later in time an admin/moderator fixes the parsing issue by adding the missing label (property)), the question is: What do we do in the mean time with the unparseable label? Do we allow that users can use that label already for annotation or is the label blocked for annotation until it can be parsed?

dobkeratops commented 5 years ago

“Dirty/old/red/car” :-

Would you consider adding labels for properties, I.e. thinking of the idea that eventually “everything is a property” (which matches the natural language “SDR” or “word vector” concepts). Eg “Dirty” , “new”, “old” could be labels available for blending.. you could label many examples of “dirty” vs “clean” objects or “old” vs “new” objects... even if you didn’t have the object names (and train a neural net to look for visual cues like mud, debris , rust, scratches, polished highlights..) ,

It might be weird but then you could just write things like “car/new”, “dirty/road”, etc ... you might be able to give hints from the text stream “did you mean...” (what adjectives are understood).

And beyond this: adding more properties that you can just add to ‘object’ to describe it...

Coming up with a base vocabulary you might be able to make more blendable descriptions of things, eg I suspect with about 100 blendable ‘base concepts’ you could correctly describe 1000s of objects, and we could consider the labels as short cuts to those. I think the ‘word embeddings’ end up with a few hundred dimensions (describing tens of thousands of words)

Eg “car” is “self propelled”,,”container”,”moves on smooth surfaces”,”has wheels”,”carries people”,”man made”,”rigid” “Wheelie bin” is “waste container”,”has wheels”,”rigid” “Bin bag” is “waste container”,”flexible” “Trailer” is “man made”, “has wheels”,”rigid” “Bicycle” is “has wheels”,”carries people”,”rigid” (absence of ‘container’,because it’s not enclosed, absence of ‘self propelled’) “Has 2 wheels”, “has 4 wheels” would let you describe motorbike,scooter

“Jet airliner” is “flying”,”man made”,”carries people”,”self propelled”,”jet powered”,”has wings”, ... “Bird” is “flying”,”animal”,”self propelled”,”has wings” “Kite” is “flying”,”controlled by strings” (Absence of propulsion or life..) “Puppet” is “articulated”,”controlled by strings”,”humanoid shape” “Statue” is “humanoid shape”,”rigid” “Action figure” is “humanoid shape”,”articulated”

... if the base concepts were available as blendable labels, you’d have a chance to describe new things

Imagine: you enter “dirty old red car” .. it’ll show you ‘understood words: dirty , old, red, car. Do you mean “dirty/old/red/car”. This parsing confirmation is needed in the cases where word groupings are ambiguous (“speed bump” doesn’t really mean “speed/bump” ... it’s “speed restriction”/“bump”? and the of course “no entry sign” doesn’t mean “no/entry/sign” .. rather “no entry”/“sign”. “Speed limit sign” =“Speed restriction/sign”

Also imagine: if you enter “flying/animal” it would show you examples that match: “do you mean ‘bird’,’winged insect’, etc..

I’m trying to imagine a procedure to build a list of base properties... eg “look at 2 objects, and without using the object names, describe one thing they have in common, or 1 thing that makes them different” ... keep going until you have enough base concepts that you can describe something unambiguously

dobkeratops commented 5 years ago

Decoupling: Under the above picture, i Imagine working on the mapping (like a manual word embedding) and image labelling independently; a client of the dataset could supply their own mapping of vocabulary their own preferred set of base concepts. A set of base concepts can also be automatically generated, i.e. by word embedding training.. imagine training an image recognition system to output the word embeddding vector rather than an actual label (... and then you pick the closest word or blend from the available vocabulary..). (I’m not sure how well word embedding deals with things like “speed bump” or “orange” as a stand-alone word, but in the case of the really troublesome ambiguous words like orange,glass,iron (iron metal or clothes iron) I would suggest always disambiguating those words (clothes iron, iron metal, orange fruit, orange color, glass cup, glass window,glass panel..)

Relation to the graph idea - it’s still consistent. It’s just a case of picking some key graph nodes. The graph is still the most general idea. The graph nodes would summaries relations between the baes nodes aswell. The base nodes for training do not need to be mutually exclusive. Eg. “Has wheels” pretty implies “man made”

We could say a word embedding is a simplified graph

I’d just want to be able to get on with assigning the right words to the right objects without waiting for all this to be set in stone - the two issues can be worked on in parallel (there will be ongoing attempts to improve natural language parsing, and there’s probably scope for a “word monkey” project too.. if it already exists.. wordnet?)

dobkeratops commented 5 years ago

One thing that makes me slightly nervous about the current description is the separation of labels and properties: ie that sounds slightly too rigid. (Although I definitely like the idea that properties could streamline creating a bunch of graph nodes.. graph compression..) Maybe I’m worrying about nothing but let me explain..

E.g. if you’ve labelled things as “car”, presumably there’s going to be a property “car body type”=“saloon,hatchback,2 seater front engine sports car,couple, fastback , ...”

... whereas I preferred the more dynamic graph picture ... “man made->vehicle->land vehicle->road vehicle->powered enclosed road vehicle->car->saloon car”

... I’m slightly uneasy about making any one step in that chain of refinements more special than any other... I just see a continuum of nodes that gradually narrow things down adding more specific properties as you go down that chain..

There’s a messy example “pickup truck” which is often somewhere between a car and truck, and possibly better in its own category. It’s not really a body type for a car, or a body type for truck.. some are used much more like cars. It’s kind of it’s own thing, in my mind. See also “SUV” (which is really intermediate between car and jeep), or “minivan”. The boundaries get messy... as such the more dynamic graph sounds easier to adapt.

You can imagine with “toy”,action figure”,”puppet”,”animatronic”,”robot” ... there also going to be increasing fuzziness and ambiguity. (Literal toy robot action figures?)

The taxonomy of life gives another example to think about .... the good example “canine” .. “pooch” or “pet dog” vs “wolf” etc.. there’s a well documented gradual narrowing down of the description (“alive->animal->vertebrate->mammal->canine->dog->German shepherd”)

What I imagined is being able to gradually refine the vocabulary and retroactively narrow the labels at any time, eg if you did label something as canine (where you weren’t quite sure which canine it was, or the label wasn’t yet available) someone else more knowledgable of the domain could retroactively re-label it as “Fox”,”coyote”,”jackal” or whatever once the label was available.

so I guess I’d just prefer to see it as combinations of words rather than making a dividing line between labels and properties. There’s a sort of implication that a label is somehow more complete than a property, but a bunch of properties combined could do the job of a label (“enclosed”,”self propelled”,”moves on smooth surfaces” etc)

I like the idea of a completely general graph being a means of search , eg “off road” gives you mountain bike, Jeep.. “flying” gives you “bird”,”aircraft”, “passenger carrier” gives you “ferry”, “jet airliner”, “bus” , ..etc. This would be helpful when you don’t quite know what something is , or might help a touch UI for label navigation on smartphones and tablets as an alternative to virtual keyboard typing

You could say labels are nouns and properties are adjectives, but all you have to do is say “red object”,”old object” to turn an adjective into a label, (“sharp/handheld”.. “sharp handheld object” .. probably a knife..) and blending nouns sort of implies adjective movement (is a “car/truck” a car with “bigger”,”Light cargo carrying” adjectives tacked on?)

dobkeratops commented 5 years ago

Case in point about the overlap between car, pickup truck, truck... 7C2B15BC-B256-4D43-915B-07A3AA2513A2

“A car converted into a pickup truck”.. IMO, if you try to make rigid hierarchical boxes.. you’ll always come unstuck in the fuzzy ambiguous continuum of the real world .

bbernhard commented 5 years ago

you raised some vaild points here. The pickup truck is indeed an example where my labels/properties approach would fail (or at least needs some adaption)

What I personally like most about the labels/properties approach is, that you can easily re-use existing annotations and add information on top of them.

e.g I want to build an image classifier that can recognize BMW cars, Golden retrievers dogs or apple smartphones. In that case I could scroll over the existing car, dog and smartphone annotations and tag them with a brand/breed label.

The cool thing (at least in my opinion) is, that I do not have to do all the bounding rectangle drawing again - which is a huge time saver.

How would you handle that? If I understood you correctly, then you are in favor of flat (no hierarchy) labels? I guess that would mean that if someone wants to add the Golden Retriever label, he would need to draw another rectangle, right?

I mean we could of course do that too. From the technical point of view nothing speaks against that. As we already have the label graph which sits on top of the labels, we could later order the labels hierarchicly.

But I still have the feeling that we're missing a huge opportunity here. It's just..I don't know...it feels like that we are just a small step away from reducing the workload. To me, drawing rectangles is the biggest effort...so re-using as much of them as possible and enrich them with additional information would be a huge plus for me.

But maybe I am worrying about nothing here.

dobkeratops commented 5 years ago

If I understood you correctly, then you are in favor of flat (no hierarchy) labels? I guess that would mean that if someone wants to add the Golden Retriever label, he would need to draw another rectangle, right?

What I had in mind is that you could change the label on the polygon, walking down the graph .. just like you can tweakthepolygon shape. in the paint program analogy, it would be like changing the polygon color.

Perhaps you Could get this effect by copying the polygon (and deleting the original?)

Relabelling could be done as a quiz for casual users (search the graph links to check the valid refinements)

The graph (I.e. not a simple heriarchy) can express “is a..” relations, but no one label or path is more special than any other. Eg you could have started out saying somethIng is a “cargo carrying vehicle” then refine it to “container ship”, just as easily as “ship” then refining to “container ship” Other examples like improvised adapted objects would be more subtle, eg a jam jar adapted as a plant pot. Or you’re might not have any idea what something is called but you can identify it’s a “metal/tool” (and later someone refine that)

You could go around labelling things with valid adjectives, even if you object labels didn’t exist, or even if you didn’t actually know what they were

The dilemna of “toy car” being a type of car or type of toy is a perfect example .. if you made the wrong choice initially, you can just make a new graph link

The advantage iwoukd be an adaptable structure, allowing long term growth ;

Perhaps properties could be seen as graph compression eg if there’s a property “parked vs driving” for vehicles, it would be equivalent to creating a node “parked vehicle”, from which you could find “parked car, parked van,..etc”

What if I label blendeing could do the job of connecting multiple properties, eg

“Car -> luxury Car” “Car -> sports car” “Car -> clean car “

“Luxury car/sports car/clean car” = “clean luxury sports car”

This might admittedly seem convoluted compared to the properties, but he graph nodes could document the sensible combinations.

Even there however .. i would personally be in favour of making all the words available for blending, and letting user verification confirm or deny which combinations make sense... validation could reject “sleeping/car”, “parked/cat” etc. Compared to a natural language string, the slashes are saving you having to figure out word grouping

(Tangentially, this raises an interesting point about how word embeddings might work however ... “sleeping” and “parked” are themselves related ideas.. an equivalent concept applied to a vehicle or animal.. unfortunately there’s no catchy word for the commonality)

dobkeratops commented 5 years ago

What I could imagine is the property system will handle people very well (so many states and variations) ... but you can still always make new graph nodes for things like “pickup truck” ,“amphibious vehicle” (connected to both boat and truck) “ekranoplan” (connect to boat and aircraft) etc. I still think a bunch of aliases for common material/object combinations could be useful too (metal gate, wooden box, plastic pipe etc)

bbernhard commented 5 years ago

What I had in mind is that you could change the label on the polygon, walking down the graph .. just like you can tweakthepolygon shape. in the paint program analogy, it would be like changing the polygon color.

that's interesting. I've always implicitly assumed that the label is static - so once a label is assigned, there is no way to change it. But if I understood you correctly, then you are proposing exactly that, right? So, in case someone draws a polygon around the pickup car below and assigns the label car, the next one would just change the label to pickup car, right?

https://user-images.githubusercontent.com/1120754/59740507-35e18a80-9260-11e9-968e-e12cf49c76fe.jpeg

I guess I am making a complete fool of myself now, but I've never looked at it like that...But I really like the idea!

The only "problem" I see with this approach is, that it might be quite hard/expensive to query the dataset.

e.g I could imagine that someone adds the label red metallic pickup truck to the above image. As dataset user however, I am only interested in pickup truck - I don't care about the color. How do I query the dataset (database) for that label?

As the labels are just plain strings, the most obvious thing would probably to do a pattern matching. i.e search for *pickup*truck*. (The bigger the dataset gets, the more expensive pattern matching will probably be - databases are usually really bad at optimizing this, but let's ignore this for a moment; we can work around that in case it becomes a problem). But does that work in all cases? I think we've had the discussion in the past: glass (material) vs. glass (drinking glass).

Not sure if this is a use case, but: Having "flat labels" makes it way harder to perform queries like: "I don't care about the actual object, just give me all annotations where color = red" or "I don't care about the actual object, just give me all annotations where material = wood". I think that will be way harder with flat labels. It's not impossible, but certainly harder. What you probably could do, is again some pattern matching (which will probably return also some false positives). So, if you want to be sure that you only get the correct result set, you probably have to create another label graph (with material:wood as root node) and with all the valid label combinations (wooden table, wood desk, solid wood desk,home office furniture made of wood`..) as child nodes. But that will be a lot of work....

Is it just me, or we just shifting the problem? Free, flat labeling is a huge improvement for all the dataset contributors - they can now crank out a lot of annotations in no time. However, all we've got now is a bunch of string blobs - which need to be ordered in a graph - and depending on the label variety (including some misspelled labels) that can be a lot of work (e.g wooden desk, desk wood, wood desk, woooden desk, wooden desck...).

What I am asking myself is: Who will sort/order/group this then? Is there anyone out there, who really enjoys that? To me this sounds like absolute hell :D This reminds me a bit of my teenage years: I've always found it way easier to keep my room in a cleaned state (once my mother cleaned it), than to clean up the mess.

What worries me a bit is, that this changes require that we need people who will continuously go over the dataset to do some maintainence tasks (add labels to graph, rename/delete misspelled labels, etc). Of course, some of those tasks can probably be handled by the community ("validation tasks"), but I think not all of them. At the moment, the label graph is an optional layer (at least that's how the graph is treated at the moment) on top of the dataset. It adds some meaning and structure, but is not absolutely necessary for the dataset's health. But with the free flat labeling, we make the label graph an essential part of the whole design and someone needs to take care of that.

A while ago, I've exported the labelme dataset and tried to restructure the dataset a bit. After a few days I gave in...I mean, I've certainly accomplished to improve the dataset a bit, but the number of misspelled labels + different labeling styles was really off putting. Tooling + small helper scripts definitely helped a bit, but in the end it was mainly manual work. At some point I got so frustrated, that I gave in. I assume that it might be less of a pain if you monitor the dataset and continuously course correct things. But it's still a lot of work..

dobkeratops commented 5 years ago

e.g I could imagine that someone adds the label red metallic pickup truck to the above image. As dataset user however, I am only interested in pickup truck - I don't care about the color. How do I query the dataset (database) for that label?

That’s where the graph would come in: all the derivatives of pickup truck would be accessible in a query for pickup truck, via graph links (fill the graph nodes with ids of the reduced set you’re interested in?) .. I would imagine many intermediate nodes - both for search, partial labelling, and training , and queries.. e.g “enclosed road vehicle” (master label for car ,truck,van..) , nodes for materials (“metal object”, “metal tool” .. etc)

I wonder if we can connect with other people actively working on natural language parsing - this is a subset of th e natural language problem. Is this graph a general enough resource that others may be interested in building it ? (Or do they already have it.. things like wordnet.. I’m not sure how it works)

.I mean, I've certainly accomplished to improve the dataset a bit, but the number of misspelled labels + different labeling styles was really off putting. Tooling + small helper scripts definitely helped a bit, but in the end it was mainly manual work. At some point I got so frustrated, that I gave in. I assume that it might be less of a pain if you monitor the dataset and continuously course correct things. But it's still a lot of work..

Indeed it IS a big job, but if we could find other interested parties .. the work could be shared, Imaging a parallel “Textmonkey” “wordmonkey”

dobkeratops commented 5 years ago

Free flat labelling.. if we can make aliases for properties - and/or make properties generate graph nodes - the two can coexst. The representations can be translated. The graph is more general, but the properties are better for adding things like color - and they will be perfect for people.

I think what we’re really doing in both cases is making a hand-crafted “sparse distributed representation” or “word embedding”.

My instinct is that you should add graph nodes for the obvious problems like “toy car” and “pickup truck”, and also for “one obvious property” eg “luxury car” “metal gate” etc.. a bunch of labels connected to material graph nodes will allow people to give material information even when they don’t know what the object is called.

If these were aliases for properties, the two systems would be ewuivalent.. but the graph is the more general (it’s sometimes ambiguous.. some people say a vehicle is a container with wheels and engine , heh. I saw that on wordnet. But a bicycle or motorbike is a vehicle too. You could fix that making a node “enclosed vehicle” (which does overlap with container). The graph means you don’t have to make any one relationship more important than the others

dobkeratops commented 5 years ago

Some example overlapping properties visualised in a graph.. concepts like cordless (cordless drill, cordless phone), communication device, material (nodes for metal object, wooden object) etc.. in my ideal world even if the label didn’t exist, you could still start by saying what material it is and giving a broad label (like container or tool) and if a better label is later added, the polygon could be relabelled.. 6A96AEC8-EA94-4DE7-8E6D-58010CB00326

Imagine Several common words: [tool,container,panel,tube,pipe,box,strut,gate,door,wall,bag,cup] combined with the few common materials that make sense [stone,metal,plastic,glass, cloth,wire,wood,ceramic,cardboard, paper] .. there will be about 3 common variants for each

bbernhard commented 5 years ago

Thanks for all the input - that's really helpful! :)

That’s where the graph would come in: all the derivatives of pickup truck would be accessible in a query for pickup truck, via graph links (fill the graph nodes with ids of the reduced set you’re interested in?) ..

But isn't the graph something that (in the worst case) needs to be built and maintained by a human being? I am fairly confident, that we could create (parts of) the graph out of label aliases, IF all the labels are defined upfront and just assembled together.

using your example from above: e.g in case we have the following labels + materials [tool,container,panel,tube,pipe,box,strut,gate,door,wall,bag,cup], [stone,metal,plastic,glass, cloth,wire,wood,ceramic,cardboard, paper] defined, I think we should be able to reliably parse the semantics of any label combination that can be generated by combining any of those labels. Once we have parsed it, we should be able to generate graph nodes out of it.

But if we don't know the labels upfront, I guess it will be much harder to do so. Depending on the label that was used, it might even be required to visually inspect the image, before the label graph can be built. e.g Imagine someone labels an image with spring. Is it now meant to be spring(season) or spring(mechanic)? (I am not an english native speaker...so not sure if that's a good example. ;)).

That's one of those reasons I like static label lists. If structured right, most of the ambiguities can be catched upfront (by providing additional context to the label) - making less room for errors. Not sure how much of a problem those ambiguities really are, but if you a lot of them, building up a label graph could become tedious. Then it's not enough anymore to just look at the label, but also in which context the label was used (i.e in the worst case you need to have a look at the image or at least have a way to mark an label as "needs improvement").

No matter in which direction we are going with the whole "free flat labeling" discussion, what I want to avoid is that we are just shifting the problem. I can definitely see that free flat labeling would be a huge improvement for data input (i.e labeling/annotation). But on the other side, I am also a bit worried that this will just shift the problem - it's now not a data input problem anymore, but a data structuring problem.

So, if we really go the "free flat label" way, I think it's important that we come up with tasks and processes how we can make the label graph generation as pleasant as possible - the label graph will become an integral building block and I think we should also treat it that way then. What I would really like to avoid is, that we switch over to free labeling, gather a lot of different (fine granular) label combinations, but lose the possibility to query the dataset, because nobody wants to put effort into structuring the collected data (either because we don't provide the right tooling to do so, or the task is just too boring/overwhelming/etc.).

I am wondering if there are any low hanging fruits we can pick? Maybe something that allows us to stick to static label lists for now and lets us focus on the other concepts that we need to introduce first. Depending on what we want to do (label parsing, label combining, label aliases, etc) I think there are quite a few software changes needed. So if could keep the moving parts at a minimum, I guess it won't hurt...

Indeed it IS a big job, but if we could find other interested parties .. the work could be shared, Imaging a parallel “Textmonkey” “wordmonkey”

that would indeed be really cool!

Unfortunately, I do not have much experience with semantic text processing - neither do I know the community and the right channels to get in touch with those people. But it's definitely worth to invest a bit of research time to get a more depth insight into the art of text processing.

dobkeratops commented 5 years ago

g Imagine someone labels an image with spring. Is it now meant to be spring(season) or spring(mechanic)? (I am not an english native speaker...so not sure if that's a good example. ;)).

Great example of an ambiguous word .. my preference would be that those are always disambiguated ie you can never write just “spring”. Compound labels (“metal spring”) might help with disambiguation. “Pile” is another one, (like can be a stack of things, or a vertical support driven into the ground)

But isn't the graph something that (in the worst case) needs to be built and maintained by a human being? I am fairly confident, that we could create (parts of) the graph out of label aliases, IF all the labels are defined upfront and just assembled together.

Right some cases are obvious (just a material prefix word) but in others, the pairing of words (like metal spring) can do a disambiguation .. “glass panel” makes it clear the word glass is material instead of glass (cup).

What I’m sort of seeing is maintainance by a human can give common combinations as hints (eg no concrete cups ..) ... but it might be nice to autogenerate or auto parse... hmmm. I don’t have an answer here - but adding some common combinations would let you start training a material recogniser, and give some more common labelsble objects. Boxes,panels etc are all over the place

What I would really like to avoid is, that we switch over to free labeling, gather a lot of different (fine granular) label combinations, but lose the possibility to query the dataset, because nobody wants to put effort into structuring the collected data

Given how general and universal the graph format is, perhaps other natural language projects could generate graphs

Would it be possible to open it up in stages, and pipeline the effort decoupling the issues-

(I) a flat label list .. literally just an aproved vocabulary (value= no typing errors, and eliminating offensive words/slang) ... but of course a flat label list could just be a list of graph nodes “object”->”whatever..”

(Ii) a graph connecting the flat label list.. a default graph given by the site, but the ability to substitute it if you can source assistance from another natural language processing project in parallel. The ability to reason that “hot air ballon” and “bird” should both be found as an answer to “flying things” is a universal problem being worked on in parallel. I think the choice of dotfiles is sensible.. good chance people would want to visualise their own semantic networks or whatever in that format..

You could make labels available for annotation use while people catch up organising the graph.

I still haven’t found any particular tools for this but they must exist.. something that would help verifying a lack of cycles, and show shortcuts etc..

I was hoping that even given a flat label list (a few 1000..) it still wouldn’t be horrendous to build a graph from scratch, starting with about 10 base labels from which you could build a tree fairly quickly, then addd more cross links (like the flying object link to bird and aircraft example)

Unfortunately, I do not have much experience with semantic text processing - neither do I know the community and the right channels to get in touch with those people.

I don’t exactly know the community either but I do use an irc channel in which there are a few Pepe wh9 do those kind of projects.. I’ve tried to point them in the direction of this project.. one has a github account.. I’ll try again .. and I could certainly spare a few evenings to build a graph for a 1000-2000 label list . I did an earlier experient generating one from go literal data .. a simple format where you list words with “is a..” and “examples” and it would fill out the nodes both ways.

I think you should be safely able to gamble.. make a big flat list for annotators to get on with, and the Organization should appear in parallel. I’ve also mentioned the other word embedding idea (using a orecomouted word embedding to place labels in a vector space.. then you could do searches based on similarity with various thresholds?)

dobkeratops commented 5 years ago

One useful graph node might be “humanoid shape” ... there’s no catchy ‘group label’ but this would link to person, puppet, doll, mannequin, action figure , statue of person, suit of armour, humanoid robot, etc etc.. this would be a very useful node regarding visual training - and as it’s not really“person” you could safely apply it to images too... it would correctly describe the shape being recognised

dobkeratops commented 5 years ago

Another difficult case: I’d probably choose toy car for these, but “toy vehicle” would be a great catch all. These are toys of land speed record cars... which are really more like jet aircraft minus wings, on wheels, lol..