Open dobkeratops opened 4 years ago
Mockup
In this scheme just using “.” / and -> as seperators would still let users activate “head” and “person” from 0.person.head and head/person alike. The existing annotations don’t get converted into instances unless we can estimate it by coverage (I suspect it will need confirmation because you can have multiple seperate fragments or confusion with overlapping people). There’s a lot of images of 1 person or man+woman where the tree could be inferred easily. A seperate piece of information: how many = 1,2,several(1-10),many(>10) Per label might be interesting
Many thanks for sharing!
At the moment, I am seeing two rather big problems:
In theory it should be possible to rework the database schema, but that will be a huge effort (a few months of work) that most results in a rewrite of most of the application logic. Although most of the application's logic is pretty well covered by the nearly 300 integration tests, I am still a bit worried that stuff will break.
Maybe it could be possible to leave the existing database schema as it is and "piggyback" it with additional tables and data structures to add a tree view like structure on top of that. But the problem with that is, if we want to utilize the tree structure in any way (e.g for querying the dataset/exporting data, etc) we would need to touch a lot of core functionality again. So, I think this "shortcut" most probably save us much time and effort :/
I totally agree with you that a tree like structure for polygons would be awesome and really cool to have. But as this would require so much effort and huge changes in the backend, I am a bit worried that I'll get lost along the refactoring way (I've seen too many open source projects die in the middle of a huge refactoring). I hope that we can maybe find an alternative approach which gets us a similar end result, but with less effort - that would be awesome :)
right I figured it might be too big an upheaval to change the database itself
I hope that we can maybe find an alternative approach which gets us a similar end result, but with less effort - that would be awesome :)
hence the idea of a naming convention - something like the dot seperators above which would let you continue using the flat label list , but just parse it onto a tree for display, entry and selection (eg in the above tree view example if you created a new polygon leg
under the second person mode, it will actually create a new flat label 2.person.leg
in the database - and that would show up individually in the existing flat label list. It would mean extra logic in the client to convert it back and forth.
We could create annotations like this now - but I wanted to go over the possibilities for instance separation and a hierarchy seperator to complement ->
and /
the existing dataset has a bunch of redundant naming convention ideas (some of which might clash) , I didn’t want to start annotating like this without first checking what most likely to get official support
There’s probably a lot of cases where coverage will figure out the connection aswell, but we’d need a way to rule out the counterexamples
I think most of the time you’d only do this level of complex labelling for images focussed on 1 or 2 main instances (I picked the crowd example as a pathalogical case to illustrate a mass of people alongside a detailed person). That reminds me of the idea of resubmitting some high res crops (a 1024x crop from a high res photo for detailed annotation )
Here's one example I've setup using the dot idea, 2 seperate person instances. adding "." to the list of seperators would mean these could still show up for searches for 'man', 'hand' etc.
these annotations would be equivalent to this tree:
man
head
face
mouth
ear
left_arm
forearm
hand
right_arm
forearm
hand
man
head
face
mouth
ear
left_arm
forearm
hand
right_arm
forearm
hand
aaah okay, so if I understood you correctly you mean that we store the label as it is (e.g: 0.man.right_arm
) in the database and parse the expression (either in the frontend or the backend) to build the tree, right?
That would definitely reduce a lot of complexity! The only thing that concerns me a bit (mostly because I do not have much experience with it) is the performance impact on the database. Because, if we have a label like 0.man.right_arm
stored in the database and we want to search for man
, we probably need to do a regular expression match in order to find 0.man.right_arm
.
While PostgreSQL is pretty good when it comes to indexing "static strings", I am not sure how well it performs when we are using a lot of pattern matching. I've read in the past, that some people use Elasticsearch on top of PostgreSQL (because text search wasn't efficient enough) to speed up queries, but that's a complexity monster of it's own.
We have talked about pattern matching (e.g .\bcar\b.
) for the search a few days ago. While this is a cool feature, that would allow us to shorten some search queries, I would see it as "nice to have". So if this at some point affects the performance negatively, we could just disable it without much impact.
But I think with the "flat tree" labels it's probably a bit more difficult. If we have thousands of those flat tree labels in the database and realize that it has a huge performance impact (which can't easily be fixed by adding more indexes), disabling the whole thing would be a real pity then.
I guess the flat tree labels potentially can get pretty deep, right? If there are only a limited number of possibilities, I guess we could use synonyms (e.g: man
== man.left_arm
| man.right_arm
| man.left_leg
| ...`). That way we could keep it simple and get away without regex like searches in the database. But that of course only works if the number of possibilities are limited. And of course that's also something that we need to curate manually.
Maybe.. the permutations might explode, e.g. man with "/" combined attributes.
you mentioned that there's "one parent level" - is this the way you implement "head/dog" as "dog.has='head'" ?
.. if that's the case, perhaps you could just split the tree root ("man"="0.man", "man"="1.man" etc) and rest of the path as new part names: "left_arm.hand'", "head.face.eye'", "left_leg.knee", etc).
if regex search isn't possible, you might also find the other "/" combined labels could be split up; A lot of those should translate into Properties; perhaps your plan was to have those supported in the database schema. (you'll see a lot of "man/walking", "person/sitting" etc => a property "state=walking", "state=sitting" etc. I'm hoping these could be gradually recognised in the system over time)
what about the "instance prefix" side of this - are there other ways you could confirm whole instances vs fragments (and connect the fragments together) ? you mentioned the possibility of polygon links. If the intention is "1 polygon per instance", fragments could be linked by a zero-area sliver (reminds me of 'degenerate triangles' for tri-stip meshes), and the renderer might be able to filter those out
you mentioned that there's "one parent level" - is this the way you implement "head/dog" as "dog.has='head'" ?
jep, right. head/dog
and `dog.has='head'" are basically equal - one is just a synonym of the other.
Internally it's stored like this in the database:
label table:
+--------------------------------------------+
| id name parent_id |
+--------------------------------------------+
1 dog null
2 head 1
schema. (you'll see a lot of "man/walking", "person/sitting" etc => a property "state=walking", "state=sitting" etc. I'm hoping these could be gradually recognised in the system over time)
I think that should already be possible. The only things that's missing here would be the logic that splits up the expression into label and property.
What worries me a bit more are the more complex "flat tree" expressions. e.g: 0.man.head.left_eye
I could, as you suggested, split the label up into a "parent part" and a "child part". So, maybe something like this:
label table:
+--------------------------------------------+
| id name parent_id |
+--------------------------------------------+
1 0.man null
2 head.left_eye 1
But then again we would need a regex pattern if we want to search for the label man
.
Ideally, it would be great to have something like this:
label table:
+--------------------------------------------+
| id name parent_id |
+--------------------------------------------+
1 0 null
2 man 1
3 head 2
4 left_eye 3
But allowing something like this would require a change in almost every database query (and some of those are pretty complex.
Maybe it's possible to use a PostgreSQL view which implements a recursive join. I'll need to look that up.
That's really a tricky one :/
if the "0." part is making any of this look harder - thats just there to explicitely specify an instance. maybe there's other ways to handle that (like confirming the polys that are definitely 1 instance.. there will be a lot that are) I wont adopt this labellnig scheme just yet - i just wanted to explore the possibilities
Something else that might make sense is labelling left and right parts, e.g. left/arm/person , left/leg/person right/arm/person, right/leg/person, or even a broad overlay (left/person, right/person). Whilst that still doesn't specify instances, it might force a net to distinguish parts of 2 people in close proximity , and would certainly make it easier to match 3d objects over the images
actually, if your plan all along was to have properties per polygon, could the instance index just be a property? tag all the connected polygons with a "instance=0", "instance=1" etc would that be any easier than trying to retrofit through label naming conventions? The intra-part hierarchy might not be so critical (and again perhaps it could be done through properties as further detail. property 'side=left' , 'side=right' 'end=front', 'end=rear')
Instead of writing a TreeView UI, you could add a way to highlight "all polygons with a specific property" to show the connected parts (which would be great for materials aswell).
2 man and 1 woman in the image
3 distinct instance indices 0=man, 1=man 2=woman
Label List
----------
man
head/man
woman
head/woman
LABEL POLYGON PROPERTIES
----- ------- ----------
man
poly0 instance=0
poly1 instance=1
man.has=head
poly2 instance=0
poly3 instance=1
woman
poly4 instance=2
woman.has=head
poly5 instance=2
division of a car. front left wheel, front right wheel, etc..
2 cars, the first annotated with detailed parts:
LABEL POLYGON PROPERTIES
----- ------- ----------
car
poly0 instance=0
poly1 instance=1
wheel/car
poly2 instance=0 side=left fb=front
poly3 instance=0 side=right fb=front
poly4 instance=0 side=right fb=back
poly5 instance=0 side=left fb=back
hub
poly6 instance=0 side=left end=front
#the hub of the front left wheel of the first car
poly7
etc
("fb" is just a property name for front or back)
Division of a person:-
LABEL POLYGON PROPERTIES
----- ------- ----------
person
poly0 instance=0
arm/person
poly1 instance=0 side=left
poly2 instance=0 side=right
0.person.left_arm would convert to person.has=arm, instance=0, side=left
but those could be retroactively assigned to anything in the property UI for existing annotations
So I just tried out the property system , I hadn't been using it but I see it all works. So in principle it seems you have this ability to retroactively tag individual polygons with information working just fine.
As it stands it would be quite time-consuming for a user to tag all the pieces this way (click the label, click the polygon, select "instance0" , press Add, repeat X10 for a detailed person..).
..so parsing properties from a naming convention in labels would still be useful
a tree-view to show all the polygons,labels and properties in the same list could be useful (e.g. maybe under 'Plate' you'd have nodes for each unassigned poly, then a node for each property 'Ceramic'.. and you drag polygons into the Ceramic node to assign them. Or just show all the polygons in this tree, and allow bulk property assignment through multiple-selection
conversely you could have a mode where you draw text at polygon centroids (labels+properties) - an extension of the visibility icon? allow selecting and assigning property information (use the centroid as a selection-handle but it highlights the outline when you have it selected, to avoid the excessive overlaps of polygon bounding boxes. the 3d tool "Maya" actually had a polygon centre-point selection mode a bit like this). This might be easier to code than making extensive new UI . It might be less intuitive for casual users but fast for experts
Using the properties system for that...that's an awesome idea. Really like that!
I guess we could add a hidden property which serves as index to the properties list.
Just thinking how the appropriate UI part would look like. Would you prefer using a drag/drop treeview or would you rather define the tree entry as text label (e.g 0.man.head.left_eye
)?
In case we want a real tree view: Where do we place the tree view? Can we integrate the tree view nicely in the unified mode or should we rework the unified mode completely?
Would you prefer using a drag/drop treeview or would you rather define the tree entry as text label (e.g 0.man.head.left_eye)? In case we want a real tree view: Where do we place the tree view?
The only thing I'd say for certain is parsing a naming convention will be a useful fast way to setup new polygons. between the options, it's not clear which would be best (and ease of implementation factors in aswell).
the current instance could be highlighted with the eye button? (or it's other polygons drawn permanently with feint lines?)
an we integrate the tree view nicely in the unified mode or should we rework the unified mode completely
the best would be to extend the unified mode to do it, perhaps it can be generalised from showing polygon by label to showing by property
perhaps a treeview could be placed in the left or right side in a tabview to toggle it with the existing label list or property list; i.e. you extend it to select by label or by property or by individual polygons from the whole list
Maybe it's possible as a modification of the properties panel itself, i.e. if it had a "browse by property" mode where it listed all the image properties , and clicking one highlighted all it's polys regardless of label (that would show the instances clearly)..
.. just like at the moment when you have a label selected, you draw new polys with that label, if you had the properties selectable to the side - you could draw with a "current label + current property" ? (but you'd need to make the difference between this right box selecting properties and assigning properties as it does as the moment). the title ("annotate all:") could help indicate this? ("annotate all: man (instance=0)" "annotate all: plate (material=ceramic)"
still really not sure what the best way to do all this is.. it might take some experiment to figure it out (maybe i can try some ideas in my desktop tool . It's starting out with a hiearchy. there's no 'properties' as such but there's now a nodes connection graph, which could demand "show connected" in the spatial views. And just writing that makes me suggest that you migth be able to implement this entirely as a "connect polygons" tool (multiple select, then click something to bind them all as an instance, and show it visually through color coding or actual inter-polygon links)
r.e. "guessing assignment", in the case where part polygons only intersect the bounding box of one potential parent, the instance information could be assumed. This could be done before training, outside the tool. this would catch a lot of examples i.e. the images with just one person, or a man+woman, or person+dog. some batch tool like that could also find the images where manual assignment is needed
The only thing I'd say for certain is parsing a naming convention will be a useful fast way to setup new polygons.
agreed. But I think e would probably drop that pretty fast in favor of a real tree, no? I mean textual input has for sure it's advantages (especially for keyboard warriors), but as annotating an image is a task where the mouse is involved heavily, I am not sure if a keyboard driven input method will boost productivity much?
perhaps a treeview could be placed in the left or right side in a tabview to toggle it with the existing label list or property list;
that's a cool idea. I guess we could also set a cookie to save the user's preference. :)
I haven't tried it yet, but this Javascript library looks promising.
maybe i can try some ideas in my desktop tool
That would be awesome!
Just out of interest: How would your ideal annotation tool look like? Let's assume for a moment now, that there are no technical restrictions nor technical debt and we can do anything we like. Do you have a favorite annotation tool (doesn't matter if desktop or web) or do you feel that there's this one feature that all existing annotation tools are missing?
I occasionally search github for image annotation tools to get some fresh ideas and always wonder if there's still any room for (revolutionary) improvements in this sector or if there's not much left to improve...
Just out of interest: How would your ideal annotation tool look like? Let's assume for a moment now, that there are no technical restrictions nor technical debt
it wouldn't be much different - the main thing is the unified approach which is 90% of the value.
tweaks I might add -
tree organization (for detail) - but its possible the overlapping properties assignment can do just as well, it might even have some advantages because some aspects of the world dont fit into trees.
common label palette (least recently used? user pref?)& [ ]
hotkeys to toggle through them (this would be really fast for 2 handed use on desktops). (something that might fit into your setup si filling the label list with speculative labels - display with "?", they dont get saved, but are confirmed (remove the ?) if you pick one to annotate with. By using [,] to toggle the main label list including this, you'd have the same speed.
a 'continuous draw' mode for pen devices , like lassoo selection, more like painting - but it might need a slightly different way of tweaking the outline after
a scribbles mode (again more pen/paint oriented.. draw within the areas and an algorithm would fill between them) - or you could scribble and draw hard boundaries to hint splitting instances as well as categories
mousewheel-point zooming (really nice 2d navigation method for desktop machines, quite a few 3d programs use it)
option to reverse the workflow e.g. cur the area up with polygons first, then assign labels to them
The holy grail would be merging aspects of annotation with photogrammetry and animation rotoscoping , and the combined "image-search"+drawing would be the backbone of that - so this could easily grow out of an annotationg tool.
Do you have a favorite annotation tool (doesn't matter if desktop or web)
not really - I haven't used many, just labelme briefy. Mostly i'm influenced by drawing programs & 3d packages. I think you could annotate well with a layers-based paint program and pen device, but it's hard to get those into the collaborative environment of the web, and their UI's are a lot more complicated.
Many thanks for all the suggestions - that really helps a lot with the long term planning of the project :+1:
In the next few days, I'll create a feature branch and start working on the tree integration. Not sure yet how long it takes until I have a first working version, but I hope that by actually working on it that I can better estimate of much work it really is.
I'll try to make it possible to switch between the tree view and the flat label list. That way, we can easily switch to the old representation in case the new implementation has some bugs (which it for sure will have in the beginning) Once the tree view is stable enough we could remove the support for the flat label list and make the tree view the new default for the unified mode.
sounds like a good plan. personally I'm reassured that the database can potentially represent instances with a property. In principle with 2 alternative 'flat' mechanisms (by label or by property), you already have something as powerful as a tree, but you'd still need to enhance the property tools (e.g. an actual way to view all the polygons with a specific material or intance property). Figuring out the best way will just take experimentation.
This discussion started with the hierachical tree idea, but it emerged that Properties give a solution from the data standpoint. you could add an "instance" property and it would solve the original issue within the existing UI & database.
So the question is really what UI will make editing and viewing this data obvious to most users, and easier and fast enough to do in bulk.
Alongside the tree view idea, maybe you can also consider tweaks to property editing (view all by material.. almost a reversal?) , and the idea of a "view all poly centres for selection" mode. but yes having all the polys in a tree, with the properties per polygon all there, would do it I think(and maybe you could offer the user a toggle, "tree root=labels vs properties")
Finally regarding opening up potential, for the use case I had in mind (detailed labelling for pose estimation) - your database already has a lot of useable examples. When no 2 parent polygon bounding boxes overlap, it would be safe to assume the instances. Adding more part labels formally: (arm,leg,neck elbow,knee,shoulder, hips, torso,), and parsing those from naming schemes ("elbow/man", "foot/person", etc) , you'll find a lot of data should become visible (I notice you've got face,hand,foot already). There might be a really simple UI acceleration possible like the ability to set part seperate to main label to reuse the entry of main (currently we have the option of pasting a string thanks to the seperators, but I dont think casual users will figure that out); also being able to translate naming conventions into properties (things like derelict/ /standing etc) will open up existing data & tasks
Finally regarding opening up potential, for the use case I had in mind (detailed labelling for pose estimation) - your database already has a lot of useable examples. When no 2 parent polygon bounding boxes overlap, it would be safe to assume the instances. Adding more part labels formally: (arm,leg,neck elbow,knee,shoulder, hips, torso,), and parsing those from naming schemes ("elbow/man", "foot/person", etc) , you'll find a lot of data should become visible (I notice you've got face,hand,foot already). There might be a really simple UI acceleration possible like the ability to set part seperate to main label to reuse the entry of main (currently we have the option of pasting a string thanks to the seperators, but I dont think casual users will figure that out); also being able to translate naming conventions into properties (things like derelict/ /standing etc) will open up existing data & tasks
That's a nice idea, I'll add that to my improvements list :+1:
Yesterday I played a bit with different javascript tree visualization libraries, and I think I'll settle for fancytree for now. It's still actively maintained and overall has a pretty nice feature set.
Here's a small example how it could look like:
The idea is to use the properties system to number the labels according to the position in the tree. So e.g: for the above example it could look like this:
grass
:0
dog
: 1
mouth
:1.0
eye
:1.1
ear
:1.2
nose
:1.3
The .
represents that it's a child node.
Those numbers are stored along all the other properties (like the material properties). So we are "abusing" the properties system to store hierarchical information. When the unified mode view gets populated we filter out all those numerical properties and use that information to build up the tree. All the other properties are then displayed as usual in the properties box on the right.
This brings us to the first limitation of that approach: As every property is directly attached to a specific (set of) polygon(s), every label that's added need to have a polygon. Or in other words: If a label is added in the unified mode, it's mandatory to also add a polygon before pressing the "Done" button. Otherwise we can't store the position of the label in the tree. So the polygon serves as "container" to store the label's position in the tree.
Another thing that we need to consider is the actual label name that's then persisted in the database. As the tree information is only used to build the polygon tree, we would end up with the following labels for the above example in the labels few:
One of the "problems" with that is, that it's not possible anymore to e.g query the dataset for "all images that shows a dog's head". The only thing we could do with the above labels is either to query the dataset for head
which gets us all sorts of heads or to query the dataset for dog & head
, which could also return false positives. (e.g: imagine an image where a human head and a dog's torso is shown).
In order to circumvent that, we could concatenate the parent and the child label. In our example above that would yield to the same labels that we already have now. i.e: nose/dog
, mouth/dog
, eye/dog
, ear/dog
.
But I think that only works nice, if we have one parent label and one child label. If the are more hierarchical levels this would translate to something like a/b/c/d
...and at that point we would run into the same problems as before ("regex for searching").
My main problem at the moment is, that I want to keep the polygon tree separated from the actual label representation and not mix both. But I still would like to see the actual label in the labels view. (If just head
is shown instead of head/dog
, a user might again add the label head/dog
in the labels view, not knowing that head
actually translates to head/dog
in the annotations view). I think mixing both could also clash a bit with the labels graph.
Sounds like potential fiddly repurcussions
But I think that only works nice, if we have one parent label and one child label. If the are more hierarchical levels this would translate to something like a/b/c/d...and at that point we would run into the same problems as before ("regex for searching").
every label that's added need to have a polygon.
that would be unfortunate because it's still useful to be able to use labels as search tags, and confirm things are in the scene, but you might not actually want to annotate them - the common label tree is often too fiddly
Maybe you dont need the whole tree path ability, e.g. just instance1 suffices , - because given part names and some additional properties (e.g. left side, right side, front end , back end, interior, exterior) - data-users could still figure out a tree, if they need it. (e.g. label arm/man, instance=1, side=left narrows down a specific arm; users know eye is part of face, is part of head, etc). maybe the tree could work by sorting specific permutations of properties (so in that case, you'd see a tree path man.1.arm.left m man/1.arm.right, etc) You'd be able to choose which properties to sort first, and change it retroactively.
left and right properties seem like an obvious choice, maybe you could even seperate the upper and lower arm, upper and lower leg in the same way "man.1.arn.left.upper_limb" = "label:man.has=arm, properties: side=left,limb_part=upper_limb, instance=1"
would that give a balance of capabilities - flexibility, ease of retrofit , searches ?
ideas
// database:
man.has=arm
instance=instance1, side=left,
instance=instance2,
man.has=head
instance=instance1
etc
// Tree for example rule: "create tree nodes for every property permuattion per label and, list parts last"
man
instance1 // closer and more detailed example
left
arm
leg
hand //eg label:man.has=hand,props: instance=instance1 side=left
shoulder
elbow
right
arm
leg
hand
shoulder
elbow
head
face
eye
nose
mouth
instance2 // further and less detailed example
head
hand
other_instances // all other polygons not assigned to instances yet
head
poly1 // user can drag this into an instance
poly2
hand
poly3
poly4
// Tree for example rule: sort main label=>instance=>part_label=>other_properties
// finally a tree node per polygon (select anything through the tree)
man
instance1 // closer and more detailed
arm
left
right
hand
left //eg label:man.has=hand,props: instance=instance1 side=left
right
shoulder
left
right
elbow
left
right
head
face
eye
nose
mouth
instance2 // further and less detailed
head
hand
// all other polygons not assigned to instances yet
etc
plate
ceramic
paper
bowl
ceramic
glass
// ordering could be swapped based on experiment or even user prefernce, only the label,part,instance,property actually appear int he database. Someone training a material recognition system might want to list materials first
perhaps it would be confusing to show a tree that isn't really a tree, but you could use icons to distinguish node types (label, intance, property,part)
you could leave "full tree" as a future idea, if you get as far as this working well?
Many thanks for sharing your ideas. It's really refreshing to hear someone else's thoughts on a specific topic - that sort of brainstorming is always extremely helpful to me :)
On a related note: Do you think that we will end up with a deep tree or will the tree mostly be flat (i.e only 2-3 hierarchy levels)?
What's e.g still a bit vague to me is when to use the label graph and when to model something in the polygon tree.
e.g: Let's assume we want to annotate a dog
.
Now, we could come up with the following (really detailed) polygon tree:
animal
quadruped
mammal
dog
head
ear
nose
eye
mouth
Or we could do it like that:
dog
head
ear
nose
eye
mouth
and define the remaining relationships (animal
->quadruped
->mammal
->dog
) via the label graph.
What I am a bit afraid of, is that we accidentally render the label graph useless. As we have two trees here (the polygon tree & the label graph), we have to be extremely careful to not mix up the responsibilities of both representations.
For me the label graph always served as some (user defined) top level view. So e.g. a biologist would probably create a label graph that could look like this:
animal
quadruped
mammal
dog
cat
biped
mammal
kiwi
On the contrary, I (interested in animals, but not a biologist) would probably create this label graph here:
animal
dog
cat
kiwi
If we have the right granularity in the polygon tree, both of us (the biologist and myself) can write our own label graph and use that to query the dataset.
But if we already add too much information to the polygon tree, we lose that possibility. e.g: If we have a polygon tree that looks like this:
animal
quadruped
mammal
dog
head
ear
nose
eye
mouth
we already have the structure (animal
-> quadruped
-> mammal
-> dog
) "hardcoded". So it's not that easy anymore to change the "view" via the label graph.
Does that make sense?
The second example is how I see it(simpler tree, “and define the remaining relationships (animal->quadruped->mammal->dog) via the label graph.”). The polygon trees will tend to be shallower, depth=1-3 . The graph .. could be 5+?
The label graph is for organising the concepts and figuring out equivalencies between refined labels (ie people could search for animal , vertebrate, mammal, or cat.. the first 3 searches would ideally still find cat, because it’s reachable through the graph. It will allow people to use very specific labels (types of car) and still have it reachable from broader searches (“vehicle”)
The idea of a polygon tree is for spatial organisation within the image. and possibly describing hierarchal features or models directly (very detailed supervision). This will help with advanced use cases .. pose estimation, and building 3D models from images. The polygon tree just needs to show nested grouping of object parts. You might also have a “crowd.specific_person”, again by spatial cluster. (train.{passenger car, locomotive}. man_and_motorbike.{man_riding,motorbike} (that’s actually a case where unlabelled polygons as roots might be useful . A connected group where you will name each part.)
in a polygon tree it might make more sense to list every polygon uniquely by default - you could avoid having to teach the user about instances. Perhaps call it “polygon list” rather than “label list”
you might have 3 potential ways to sort, each view could do a different job (view Labels, view Polygons, view Properties). Eg editing materials, it would make a lot of sense to be able to see all* the polygons marked as wood, regardless of label or instance. It just so happens the database is ordered “label first”. This is actually quite common in graphics systems.. “a group for each texture, then store all the polygons using it”, but systems might have to sort different ways in other situations
So one does not obsolete the other - they do very different jobs.
Also the graph can express multiple routes to the same item. You could search for carnivore
, and you’ll get the carivorous mammals, and reptiles. Thered be a path animal->reptile->crocodile, animal->mammal->cat and carnivore->crocodile, carnivore->cat. Training might discover that all carnivores have sharp teeth, claws etc
I have put graph nodes into the images which might cause confusion,but the idea is these are purely graph suggestions : eventually I’m hoping they will be reduced, but in the meantime using “->” as a seperator means you can search for “car” or “sportscar” and you’ll find the images with “car->sportscar”. The intention is that once “sportscar” is a graph node , the “car->” prefix can be stripped out. It is not for th polygon tree. It’s just handy to submit these suggestions within the database. I’ve stuck to using / to blend object name, parts , and potential properties
mockup of tabbed label + tree view
As a short term suggestion, perhaps you could rename properties as “instance-properties”, and just add the instance ids visible in the list . Then the tool is capable of editing and viewing instance grouping (the first goal here). It’s probably well worth looking into an enhanced property editor (something like - view all poly centres, view all properties in the list , and the selected property highlights all its polygons from all labels, with a tool to toggle it. And possibly make properties available as part of the current drawing mode anyway - “annotate all: fence, material=metal”
Some ideas.. if you did go the full tree route, imagine if you could make un-named group polygons, but place specific labels under them - a way of describing whole groups (which might be hard to internally seperate) perhaps this could tie back into image descriptions somehow. Imagine if you could attach those descriptions to parts of the scene . “A bunch of people riding bicycles on the road” “a man standing mounted on a bicycle looking back smiling”
Many thanks for all the suggestions - very much appreciated!
It's a really tricky one...
I've thought about it all day and while I still believe that it's possible to implement the polygon tree, I think the end result won't be pretty (at least in terms of maintainability). So far, I haven't found a solution that would allow us to implement the polygon tree while still being flexible enough to add all the other features & suggestions you mentioned afterwards. It becomes more and more obvious to me that the current database schema isn't designed in a way to support that. So, if we add the polygon tree it would probably be through some sort of "hack"/"abuse" of an existing feature...which has the big potential to backfire big time at a later point. So far, every solution I've played through either ended up breaking some existing features (e.g discoverability of labels) or was so hacky that it made it impossible to add other features in the future.
I've looked a bit at labelme's implementation and I think they were facing the same problems back then. I believe in the beginning they really tried to keep the data structured and accessible. But the more features they added, the more difficult it became for them to keep the data structured (no misspelled labels, "no garbage labels", label hierarchy, etc.) and accessible (via search). And I think I understand now also why. Ticking all those boxes is a lot of work and requires a really well thought through design...
I am not yet giving up though...still hoping to find a solution that integrates a bit better into the existing database schema and is less destructive.
right it does look like a big upheaval. Perhaps working on new tools to manage properties is the way to go. It seems an instance property is enough to group the connected limbs of a person, so you could just make just extend the tools to view and assign properties. That will have other uses anyway: it would be great if you could view all the polys (from all labels) by material (I imagine a new version of the properties list that works in parallel to the label list, list all the properties, and click there to show everything that uses it, and apply the selected property to all new polygons that you draw?)
Some more experiments with human part annotations - tried using the ellipse tool, this could actually be faster (perhaps with a hotkey for rotation such that you could rotate in the drawing mode),
imagine if you could quickly toggle between person-part labels from a palette or the existing label list, (using the [,] hotkeys)
These might actually be better than polygons for pose-estimation - they sort of imply a centre,axis and range better. It might be possible to reduce drawing a limb to 2 drags (draw a line down the axis, then set the width)- or perhaps you could connect circles drawn around the joints(that would imply the label. "upper-arm" = "an ellipse connecting shoulder,elbow")
rotation hotkeys could speed this up - perhaps "," "." (e.g. 9-degree degree increments.. 5 taps=45deg) -rotate the last drawn object, or maybe rotate the actual image , and you always draw screen-aligned
Perhaps insteead of a polygon tree, there are other ways you can investigate to assist annotating parts of people. imagine if you could draw circles at the joints ,then connect them (a dedicated limb-annotation)
examples:-
suggestion for drawing joint-connections automatically - based on a joint naming scheme. the idea would be to enable the user to set a connectivity scheme specifying polygon names; to use this every label must be used only once (and disambiguated with blends/properties for multiple instances) eventually connectivity schemes could be built into the system
the initial suggestion is an optional entry box to define a string to define label connections. default it to "n/a" to emphasise that you dont have to set anything here.
Connectivity needn't be stored in the database. it could purely be a visual aid. The visual aid would help users acheive consistent labelling. Perhaps a "draw-joints" mode could automatically advance the current label. You might need to draw the limbs un-labelled first, then assign the names, so you can see whats going on (left vs right etc)
It might be possible to agree on connectivity that is baked into the system, it might get complex with arthropods, even dogs & cats need some thought e.g. what to call the foreleg/hindleg and the particular joints there. where in nature do we swap between a 'foot' and a 'hand'
a simpler implmentation would require you to enter "left/shoulder->left/elbow, right/shoulder->right/elbow,.." explicitely. an advanced scheme would detect any partial match with its side and instance variations and automatically carry them over for label suggestions, as if they were blended with a wildcard "/shoulder/ -> /elbow/"
The full connectivity scheme might need concept of LOD e.g.if an "advanced joint" exist, it's connections can hide (shortcut) basic connections - for refinements like fingers, curved spine. e.g. the basic scheme would show elbow->hand, but if you annotate wrist, fingers, links "elbow->wrist", and "wrist->fingers" supercede "elbow->hand". before figuring this out, just rely on the user to manually choose an appropriate connectivity list
It might also help to have an official "occluded" and "unoccluded" flag. default state is "indeterminate", In this example, the right shoulder and right hip are both invisible, but the user can imply their position because of the legs and arms.
it's possible a limb-polygon could be guessed, filling along the edge (e.g place another ellipse with a minor radius averaged from its connections), but it might be hard to place the endpoints (e.g. perspective and taper). Also not every connection implies a limb as such (neck->head is tricky. you might really want "base of head,base of neck" for that to work) This connected-joints feature could co-exist with drawing an entire outline
This might seem complicated, but with the right tweaks and assists , it could actually be easier than drawing polygon outlines, because your eyes more naturally identify the "blobs" of mass, and the total number of clicks (mouse-up/mouse-down per drag operation) could be smaller (you need to click at least 6 points to specify a rounded 'blob' as polygons, wheras it could be 2 drags to make and orient an ellipse or draw it's primary axis and scale the minor). Finally the outline is one continuous action encompassing the whole, whereas drawing it one joint or limb at a time is more 'progressive' - smaller bitesized actions
man, I really love your mockups - they look awesome!
Originally, I had something like this in mind (however I am not sure if this is a good workflow):
ellbow/man
, wrist/man
, knee/man
...etc). What's happening internally is, that we assign a alphanumerical property to each of those polygons indicating that those limbs are connected together. e.g: If there are two men in the picture and we want to connect the limbs shoulder/man
, elbow/man
, wrist/man
together, we would end with those properties:
instance #1:
shoulder/man
: joint-0.0
elbow/man
: joint-0.1
wrist/man
: joint-0.2
Instance #2:
shoulder/man
: joint-1.0
elbow/man
: joint-1.1
wrist/man
: joint-1.2
So by parsing the properties, we know that the limbs are connected like this: shoulder/man
-> elbow/man
-> wrist/man
.
The only tricky thing is probably to select the correct limbs in an image that has a lot of polygons. (I guess that could probably be a bit cumbersome?). The advantage I would see is, that we could use that on the existing dataset. So if there are already annotated limbs we could easily connect them.
But of course, we could also give your joint connection naming scheme a shot. The only challenge I see with that is to create a UI workflow that integrates nicely into the unified mode. Because if I understood you correctly then we wouldn't use the labels list on the left anymore to switch between the labels, but instead use the joint naming scheme to iterate over the limbs, right? I guess that could require a bit of UI tweaking to make that work (the most challenging part is probably to make it clear which UI elements are active and how the user can switch between modes (the "joint mode" and the "normal annotation mode"). I think without a clear UI that could become quite complex.
if you want to connect those limbs you toggle the "show all annotations" button and connect the limbs with the "joint tool"
This sounds quite interesting
There's many ways to approach it.. a case of finding the best balance between useability and ease of bug-free implementation , without massive upheaval to the existing system. It does seem the properties system gives you a lot of options
Because if I understood you correctly then we wouldn't use the labels list on the left anymore to switch between the labels
what could happen is when you start with one extremity, the connection information could drive generating the next label. However there's a downside that you might not always want every label (you skip some because they're occluded or offscreen). so you'd still need the labels UI to be active. it would just behave like it has a limited autopilot
Something far easier to implement is just a straightforward hotkey to toggle , so you could have pasted a large preformated label list (this is working fine thanks to the seperator parsing)
The simplest implementation is that there's no seperate tool at all; it just uses an optional connectivity list to draw the lines, and "[" "]" label toggle keys would be universally useful shortcuts that accelerate this and all other annotation
There's a few ideas for an 'elipse tool ++' . the first is to bolt on a rotation assist (rotate hotkeys or alternate between the first drag drawing the ellipse whilst the second drag orients it). You might want to make this start drawing at the centre, i.e. it's easier to judge placing the ellipse on the objects centre, then rotate around that (contrast to the rectangle bounding box tool). he other is to flip the process so you draw it's "major axis" first, then set it's width (minor axis). You could start with an assumed aspect ratio e.g. 0.5, then the "," "." hotkeys scale it *sqrt(2) 1/sqrt(2) respectively (2 taps doubles or halves the width), or have a 2 state mouse drag tool.
There's many ways to approach it.. a case of finding the best balance between useability and ease of bug-free implementation , without massive upheaval to the existing system. It does seem the properties system gives you a lot of options
yeah, I think so too. :)
what could happen is when you start with one extremity, the connection information could drive generating the next label. However there's a downside that you might not always want every label (you skip some because they're occluded or offscreen). so you'd still need the labels UI to be active. it would just behave like it has a limited autopilot
that's what I am a bit afraid of. That we end up with a UI that's actually really powerful, but behaves in a way that's not obvious to most users. Personally, I always felt the most productive in any application if I recognized similar patterns. If I had to use an UI were some options were not obvious to me (or felt contradicting) I always lost a bit of flow.
I am wondering if we can use the existing workflow (with the labels list on the left for selecting the label) together with a "joint tool" and add a bunch of hotkeys (switching between labels, rotating a polygon etc) on top of that to speed up certain things? Maybe we can also add a right mouse context menu with additional options? That way we don't have to introduce a lot of new concepts and maybe(?) get something powerful as well?
Some simpler tweak suggestions to enhance drawing ellipse-bounded annotations.
"joint tool" and add a bunch of hotkeys (switching between labels, rotating a polygon etc) on top of that to speed up certain things?
yes i thinkso. this diagram doesn't cover a 'connect-joints tool', there's 2 ways you could work. explicitely drawing the limbs - then you've mostly eradicated the need to draw a bounding shape - you can get a pretty good aproximation this way. or drawing the joints, then a draw-connection tool (which could add the green lines in the above mockup),.. or just getting those from common naming conventions (shoulder->elbow etc).
( there is another way you could approach annotating limbed creatures: instantiating a prototype, with all the connections already setup, and then moving each joint into place. some 3d packages have a base humanoid skeleton to work from like that. It might be less obvious how to use that though , and it would be harder to deal with for the examples where you can't see the entire figure due to screen edges and occlusion)
( there is another way you could approach annotating limbed creatures: instantiating a prototype, with all the connections already setup, and then moving each joint into place. some 3d packages have a base humanoid skeleton to work from like that. It might be less obvious how to use that though , and it would be harder to deal with for the examples where you can't see the entire figure due to screen edges and occlusion)
right, that would definitely be the icing on the cake. But as this is something completely different UI wise, I am having a hard time seeing how that one integrates nicely into the unified mode. I think it would be way nicer to have a separate view for that (or at least add a switch to the unified mode to switch between modes).
If there are no objections from your side however, I would prefer to start with the most simple version. The plan would be to create a prototype which uses the properties system in the background for storing the joints. In a first iteration I would like to use as many existing functionalities as possible. That way it should (hopefully) be possible to crank out a first version pretty fast. If it turns out that the properties system is the way to go here, we can add more functionality (hotkeys, ellipse tool, etc) to make it more convenient to use.
I would prefer to start with the most simple version
absolutely, thats the best plan. as it stands now, the [,] hotkeys would be the low-hanging fruit to accelerate this and other tasks (we can already paste a label list, and this would help with images with existing label list). I'm hoping the naming conventions can be parsed into properties (left/... etc)
Ok, I've tried to create a "mockup" (yeah I know, it's a bit exaggerated to call that a mockup :sweat_smile:) of what I have in mind.
elbow/person
, wrist/person
, shoulder/person
, etc). shoulder/person
, elbow/person
and wrist/person
polygons of the man, it would connect those three polygons together. (we can show that visually by drawing links between those polygons; similar to your example above)The advantage of first drawing the limbs and then connecting them together is, that we can use the properties system in the background for representing the joints in the database. That way we can leave the backend as it is (except for a few small changes) and just need to focus on the frontend.
What do you think? Do you think that's too complicated or do you think that could work?
right so this will allow connecting between specific instances, in the case where you have multiple visible (scenes with more than one person); that'll be great.
Something else orthogonal i've been doing in labelling is left/shoulder/person,left/elbow/person, right/shoulder/person, right/elbow/person
.. In the case where there is just one person, that should also suffice. I'm hoping that with the "/" seperator, that can be parsed into "side=left, person.has=shoulder" etc to fit in with the schema. It might be possible to retroactively assign a left/right property and let that flow across connections in the joints view
yeah, right.
I also thought about your idea with the "joint naming scheme" again and while I think it will work great for images with single instances (e.g: single person), I think it will become quite difficult in a big scene with multiple instances. If we have an image with dozens of elbow/person
, wrist/person
and shoulder/person
polygons and the joint naming scheme shoulder/person -> elbow/person -> wrist/person
I think it will be difficult to automatically find the correct polygon instances to connect.
Maybe we could try to "guess" the right polygon instances by looking at the x/y coordinates (with the assumption that each elbow/person
polygon will be connected to the nearest wrist/person
polygon), but I think that will be pretty error prone (especially for scenes with a lot of instances; e.g. crowds of people)
Something else orthogonal i've been doing in labelling is left/shoulder/person,left/elbow/person, right/shoulder/person, right/elbow/person.. In the case where there is just one person, that should also suffice. I'm hoping that with the "/" seperator, that can be parsed into "side=left, person.has=shoulder" etc to fit in with the schema. It might be possible to retroactively assign a left/right property and let that flow across connections in the joints view.
I think that should be doable. Once we have a proper system in place we could write a bunch of migration scripts that migrates the existing data to our new representation :)
Maybe we could try to "guess" the right polygon instances by looking at the x/y coordinates (with the assumption that each elbow/person polygon will be connected to the nearest wrist/person polygon), but I think that will be pretty error prone
yes it will be impossible in crowds.
There's quite a lot of images of one man + one woman, these will work ok by virtue of each peice being labelled /man /woman; and there's a few images with 2-3 people, where a bounding box + man vs woman label will seperate them. But you're right - in the general case you can assume nothing.
I guess what we could probably also consider is to add "shortcuts" (one of those shortcuts could e.g. be the "joint naming scheme") on top of the generalized joints mode in case it is to cumbersome to use for the simple cases. But I think it might be better to start with the generalized solution first and then maybe add some simplifications on top of that if needed. :)
ah one little point about naming. I think it's fine to call this "joints" view, "joints" tool ,but lets think about the name of the connections..
there's several words involved here:-
in 3d packages the term 'joint' is sometimes used for the whole coordinate frame, and you do indeed weight vertices to it - but physically a joint is more like the centre of rotation, and the place where 2 parts 'join'.
"bones" is also used, but sounds a bit too specific here, e.g. we might want to label actual bones aswell
i've done some examples with the connections called "limbs", but again you might think this is the whole limb (upper+lower part)
perhaps call these "joint-connections" (unambiguous) and see if we can figure out a handier name later?
joints:
*joint-connections or limb-parts
*other:
for quadruped animals
foreleg, -> upper_foreleg, lower_foreleg hindleg -> upper_hindleg, lower_hindleg, .. not sure how to label the lower parts of a dogs leg
The diagram & mockup itself is fine. I agree with the idea of drawing those parts first, then connecting them. (it's usually easier to locate these points and throw a circle onto them)
Pose-estimation sometimes shows connections between the shoulders and hip, connecting one whole skeleton together, but we need to be careful about the possibilities between these things i.e the neck,spine,torso - showing some additional articulation , twist etc. (i've done some examples with an "upper_torso, lower_torso". animators often split the spine into 3 bones to give enough control)
this is one good reference image i managed to find in your dataset:
The advantage of first drawing the limbs and then connecting them together is, that we can use the properties system in the background for representing the joints in the database. That way we can leave the backend as it is (except for a few small changes) and just need to focus on the frontend.
additional suggestion ... would it be feasible to add a "left/right" property selector at the same time as selecting the joint connections (a current state) side=
Here's a short gif of how the whole thing could look like:
Instead of using the buttons (which is probably a bit tedious) we could add a bunch of hotkeys. e.g: "c" to begin a new "joint connection path" and "enter" to end it. With two additional hotkeys ("l"/"r") we could specify the side (left/right).
I hope that's going into the right direction?
Instead of using the buttons (which is probably a bit tedious) we could add a bunch of hotkeys. e.g: "c" to begin a new "joint connection path" and "enter" to end it. With two additional hotkeys ("l"/"r") we could specify the side (left/right).
hotkeys are always good. one more option to consider here is dragging, (like rubber-band line draw) that can be very intuitive but might not be so nice with a touchscreen. you might be able to reduce it to one hotkey e.g. "pressing C again will close the current path and begin a new one, and just exit this mode to stop altogether". you could also just require clicking twice on the elbow, e.g. 1 click always starts then the next click always closes a joint (and you have to click again to start a new one). with chains of just 2 it might be ok. Yet another option is clicking twice on the endpoint 'cancels'. end on the same joint = do nothing and stop continuing, until you click another one to start.
There's alot of options here.. hard to say which will be quickest to use (and quickest to learn) without experiment, I'm not sure which my own favourite is here.
Many thanks for all the suggestions - very much appreciated! The rubber-band suggestion sounds really interesting, I think I'll give that one a go with the next iteration. But I agree with you, at the current state it's hard to say which of those options is the best. I think that will become more clear once there is a usable prototype out there. I'll keep updating this thread here with some gifs and sceenshot as the whole thing grows. Hopefully that gives a bit of an early impression how the first version will look like in the end :)
wanted to create a seperate thread for these ideas one of the biggest enhancements to the dataset would be the ability to represent trees of polygons, as LabelMe does.
2 challenges [1] how to retrofit it [2] how to avoid overwhelming casual users
I've resisted suggesting it becuase [1] I know UI can be much harder to get right than it looks and [2] all the other tweaks up until now were way more useful. but after messing around with GTK's treeviews I saw that it gave you drag-drop tree manipulation out of the box - I'm hoping the JS toolkits are at least as good as that
potential benefits of a polygon tree:
Easier to manage more detail in an image eg.
person.head.face.eye
car.wheel.hub
house.door.handle
through expanding node you could show the current containeropen up the raw unqualified part labels, tree location qualifies them
allow simultaneous use of blending and foo->bar graph hints with hierarchical organisation (eg.
animal->zebra . limb->leg->foreleg
clearing up instance boundaries - multiple fragments of an occluded object can be grouped as a single object
a tree view with a strict order could define depth ordering, to clear up occlusion. you could assume the painters algorithm, and some labels can be marked as transparent (tree,window,glass,..). currently you dont know if people labelled the whole occluded object (as LabelMe actually recomends) or just the unoccluded parts. I dont know which is better, it's very ambiguous IMO
you could show the toplevel outlines, or the outline of the tree path for the current object you are detailing
You're directly defining a hierarchical feature map , which is what most vision systems are trying to figure out internally. There might be some ways to accelerate training?
Also consider the photogrammetry ideas, looking for overlap between labelling and hints for object reconstruction most 3d object representations are also hierachical
Avoid having to re-draw the outline if you already created the parts (legs,body,arms,head,tail - the whole object is defined by the union of these ares). Or you could start by drawing around the entire parent, and just draw the internal boundaries
Sync with LabelMe data
ImageMonkey's current idea is useful enough , but with arbitrary tree depth you could go further
ideas for retrofitting
how about if we came up with a syntax to represent a graph path; "/" would have been reasonable because of it's use in directory trees, and it's already used here as "head/cat" etc.. however I've used it alot for label combining, and the "head/cat" convention is backwards.
ideas:
car/door/handle
car::door::handle
car.door.handle
car#door#handle
car{door{handle}}
another idea would be to use something else entirely. fat arrow?
car=>door=>handle
(might get confused with ->) triple colon? car:::door chevrons (kind of like arrows) car>>doorUltimately the user might never need to see these, i.e. they'd just be parsed into a treeview
seperating instances
0.person.head
1.person.head
defines heads of 2 seperate people1.person.head
groups with1.person
. Each number would be a seperate root object.this is definitely an "advanced" feature.
Could considering existing data and most users contributions as "indeterminate" (r.e. occlusion and instance information) let both co-exist?
Could drawing outlines with color coding or thickness make it obvious how it's currently organised? could the annotaion "command" explain it ("Annotate parts of car: Wheel")
could "organize the polygons" be a seperate task?
could you toggle between a flat polygon list and a unique instance treeview? (you might find that both are useful,e.g. regarding hiding and finding things)
Could the common objects automatically setup their components, then you actually save casual users from having to figure out the part syntax? A casual user will just annotate a
dog
, the treeview would automatically show it could be expanded. If they click, they'll seehead,neck,foreleg,hindleg,tail
, and if they expandhead
, they'll seeeye,nose,mouth,ear
. expandmouth
to seetongue,teeth
You could get a long way with tree templates for quadrupedal animals, people, and 4-wheeled road-vehicles. "only let registered users setup tree nodes"?. Viewed this way, it might actually simplify use.