ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

tree ideas #276

Open dobkeratops opened 4 years ago

dobkeratops commented 4 years ago

wanted to create a seperate thread for these ideas one of the biggest enhancements to the dataset would be the ability to represent trees of polygons, as LabelMe does.

2 challenges [1] how to retrofit it [2] how to avoid overwhelming casual users

I've resisted suggesting it becuase [1] I know UI can be much harder to get right than it looks and [2] all the other tweaks up until now were way more useful. but after messing around with GTK's treeviews I saw that it gave you drag-drop tree manipulation out of the box - I'm hoping the JS toolkits are at least as good as that

potential benefits of a polygon tree:

ImageMonkey's current idea is useful enough , but with arbitrary tree depth you could go further

ideas for retrofitting

how about if we came up with a syntax to represent a graph path; "/" would have been reasonable because of it's use in directory trees, and it's already used here as "head/cat" etc.. however I've used it alot for label combining, and the "head/cat" convention is backwards.

ideas:

another idea would be to use something else entirely. fat arrow? car=>door=>handle (might get confused with ->) triple colon? car:::door chevrons (kind of like arrows) car>>door

Ultimately the user might never need to see these, i.e. they'd just be parsed into a treeview

seperating instances

this is definitely an "advanced" feature.

dobkeratops commented 4 years ago

Mockup 7EFF281D-9244-4542-8AD7-1AAE1F467957

In this scheme just using “.” / and -> as seperators would still let users activate “head” and “person” from 0.person.head and head/person alike. The existing annotations don’t get converted into instances unless we can estimate it by coverage (I suspect it will need confirmation because you can have multiple seperate fragments or confusion with overlapping people). There’s a lot of images of 1 person or man+woman where the tree could be inferred easily. A seperate piece of information: how many = 1,2,several(1-10),many(>10) Per label might be interesting

bbernhard commented 4 years ago

Many thanks for sharing!

At the moment, I am seeing two rather big problems:

In theory it should be possible to rework the database schema, but that will be a huge effort (a few months of work) that most results in a rewrite of most of the application logic. Although most of the application's logic is pretty well covered by the nearly 300 integration tests, I am still a bit worried that stuff will break.

Maybe it could be possible to leave the existing database schema as it is and "piggyback" it with additional tables and data structures to add a tree view like structure on top of that. But the problem with that is, if we want to utilize the tree structure in any way (e.g for querying the dataset/exporting data, etc) we would need to touch a lot of core functionality again. So, I think this "shortcut" most probably save us much time and effort :/

I totally agree with you that a tree like structure for polygons would be awesome and really cool to have. But as this would require so much effort and huge changes in the backend, I am a bit worried that I'll get lost along the refactoring way (I've seen too many open source projects die in the middle of a huge refactoring). I hope that we can maybe find an alternative approach which gets us a similar end result, but with less effort - that would be awesome :)

dobkeratops commented 4 years ago

right I figured it might be too big an upheaval to change the database itself

I hope that we can maybe find an alternative approach which gets us a similar end result, but with less effort - that would be awesome :)

hence the idea of a naming convention - something like the dot seperators above which would let you continue using the flat label list , but just parse it onto a tree for display, entry and selection (eg in the above tree view example if you created a new polygon leg under the second person mode, it will actually create a new flat label 2.person.leg in the database - and that would show up individually in the existing flat label list. It would mean extra logic in the client to convert it back and forth.

We could create annotations like this now - but I wanted to go over the possibilities for instance separation and a hierarchy seperator to complement -> and / the existing dataset has a bunch of redundant naming convention ideas (some of which might clash) , I didn’t want to start annotating like this without first checking what most likely to get official support

There’s probably a lot of cases where coverage will figure out the connection aswell, but we’d need a way to rule out the counterexamples

I think most of the time you’d only do this level of complex labelling for images focussed on 1 or 2 main instances (I picked the crowd example as a pathalogical case to illustrate a mass of people alongside a detailed person). That reminds me of the idea of resubmitting some high res crops (a 1024x crop from a high res photo for detailed annotation )

dobkeratops commented 4 years ago

hrctest

Here's one example I've setup using the dot idea, 2 seperate person instances. adding "." to the list of seperators would mean these could still show up for searches for 'man', 'hand' etc.

these annotations would be equivalent to this tree:

man
    head
        face
            mouth
        ear
    left_arm
        forearm
        hand
    right_arm
        forearm
        hand        

man
    head
        face
            mouth
        ear
    left_arm
        forearm
        hand
    right_arm
        forearm
        hand
bbernhard commented 4 years ago

aaah okay, so if I understood you correctly you mean that we store the label as it is (e.g: 0.man.right_arm) in the database and parse the expression (either in the frontend or the backend) to build the tree, right?

That would definitely reduce a lot of complexity! The only thing that concerns me a bit (mostly because I do not have much experience with it) is the performance impact on the database. Because, if we have a label like 0.man.right_arm stored in the database and we want to search for man, we probably need to do a regular expression match in order to find 0.man.right_arm.

While PostgreSQL is pretty good when it comes to indexing "static strings", I am not sure how well it performs when we are using a lot of pattern matching. I've read in the past, that some people use Elasticsearch on top of PostgreSQL (because text search wasn't efficient enough) to speed up queries, but that's a complexity monster of it's own.

We have talked about pattern matching (e.g .\bcar\b.) for the search a few days ago. While this is a cool feature, that would allow us to shorten some search queries, I would see it as "nice to have". So if this at some point affects the performance negatively, we could just disable it without much impact.

But I think with the "flat tree" labels it's probably a bit more difficult. If we have thousands of those flat tree labels in the database and realize that it has a huge performance impact (which can't easily be fixed by adding more indexes), disabling the whole thing would be a real pity then.

I guess the flat tree labels potentially can get pretty deep, right? If there are only a limited number of possibilities, I guess we could use synonyms (e.g: man == man.left_arm | man.right_arm | man.left_leg | ...`). That way we could keep it simple and get away without regex like searches in the database. But that of course only works if the number of possibilities are limited. And of course that's also something that we need to curate manually.

dobkeratops commented 4 years ago

Maybe.. the permutations might explode, e.g. man with "/" combined attributes.

you mentioned that there's "one parent level" - is this the way you implement "head/dog" as "dog.has='head'" ?
.. if that's the case, perhaps you could just split the tree root ("man"="0.man", "man"="1.man" etc) and rest of the path as new part names: "left_arm.hand'", "head.face.eye'", "left_leg.knee", etc).

if regex search isn't possible, you might also find the other "/" combined labels could be split up; A lot of those should translate into Properties; perhaps your plan was to have those supported in the database schema. (you'll see a lot of "man/walking", "person/sitting" etc => a property "state=walking", "state=sitting" etc. I'm hoping these could be gradually recognised in the system over time)

what about the "instance prefix" side of this - are there other ways you could confirm whole instances vs fragments (and connect the fragments together) ? you mentioned the possibility of polygon links. If the intention is "1 polygon per instance", fragments could be linked by a zero-area sliver (reminds me of 'degenerate triangles' for tri-stip meshes), and the renderer might be able to filter those out

bbernhard commented 4 years ago

you mentioned that there's "one parent level" - is this the way you implement "head/dog" as "dog.has='head'" ?

jep, right. head/dog and `dog.has='head'" are basically equal - one is just a synonym of the other.

Internally it's stored like this in the database:

label table:
+--------------------------------------------+
|   id         name             parent_id    |
+--------------------------------------------+
    1           dog                 null

    2           head                 1

schema. (you'll see a lot of "man/walking", "person/sitting" etc => a property "state=walking", "state=sitting" etc. I'm hoping these could be gradually recognised in the system over time)

I think that should already be possible. The only things that's missing here would be the logic that splits up the expression into label and property.

What worries me a bit more are the more complex "flat tree" expressions. e.g: 0.man.head.left_eye

I could, as you suggested, split the label up into a "parent part" and a "child part". So, maybe something like this:

label table:
+--------------------------------------------+
|   id         name             parent_id    |
+--------------------------------------------+
    1           0.man                 null

    2           head.left_eye        1

But then again we would need a regex pattern if we want to search for the label man.

Ideally, it would be great to have something like this:

label table:
+--------------------------------------------+
|   id         name             parent_id    |
+--------------------------------------------+
    1           0                     null

    2           man                  1
    3           head                 2
    4           left_eye            3

But allowing something like this would require a change in almost every database query (and some of those are pretty complex.

Maybe it's possible to use a PostgreSQL view which implements a recursive join. I'll need to look that up.

That's really a tricky one :/

dobkeratops commented 4 years ago

if the "0." part is making any of this look harder - thats just there to explicitely specify an instance. maybe there's other ways to handle that (like confirming the polys that are definitely 1 instance.. there will be a lot that are) I wont adopt this labellnig scheme just yet - i just wanted to explore the possibilities

Something else that might make sense is labelling left and right parts, e.g. left/arm/person , left/leg/person right/arm/person, right/leg/person, or even a broad overlay (left/person, right/person). Whilst that still doesn't specify instances, it might force a net to distinguish parts of 2 people in close proximity , and would certainly make it easier to match 3d objects over the images

dobkeratops commented 4 years ago

actually, if your plan all along was to have properties per polygon, could the instance index just be a property? tag all the connected polygons with a "instance=0", "instance=1" etc would that be any easier than trying to retrofit through label naming conventions? The intra-part hierarchy might not be so critical (and again perhaps it could be done through properties as further detail. property 'side=left' , 'side=right' 'end=front', 'end=rear')

Instead of writing a TreeView UI, you could add a way to highlight "all polygons with a specific property" to show the connected parts (which would be great for materials aswell).

2 man and 1 woman in the image
3 distinct instance indices 0=man, 1=man  2=woman

Label List
----------
man
head/man
woman
head/woman

LABEL   POLYGON PROPERTIES
-----   ------- ----------
man
    poly0   instance=0
    poly1   instance=1

man.has=head
    poly2   instance=0
    poly3   instance=1

woman
    poly4   instance=2

woman.has=head
    poly5   instance=2

division of a car. front left wheel, front right wheel, etc..

2 cars, the first annotated with detailed parts:

LABEL   POLYGON PROPERTIES
-----   ------- ----------

car
    poly0   instance=0
    poly1   instance=1
wheel/car
    poly2   instance=0 side=left fb=front
    poly3   instance=0 side=right fb=front
    poly4   instance=0 side=right fb=back
    poly5   instance=0 side=left fb=back

hub
    poly6   instance=0 side=left end=front  
            #the hub of the front left wheel of the first car
    poly7
etc

("fb" is just a property name for front or back)

Division of a person:-

LABEL   POLYGON PROPERTIES
-----   ------- ----------
person
    poly0   instance=0
arm/person
    poly1   instance=0 side=left
    poly2   instance=0 side=right

0.person.left_arm would convert to  person.has=arm, instance=0, side=left
but those could be retroactively assigned to anything in the property UI for existing annotations
dobkeratops commented 4 years ago

So I just tried out the property system , I hadn't been using it but I see it all works. So in principle it seems you have this ability to retroactively tag individual polygons with information working just fine.

As it stands it would be quite time-consuming for a user to tag all the pieces this way (click the label, click the polygon, select "instance0" , press Add, repeat X10 for a detailed person..).

..so parsing properties from a naming convention in labels would still be useful

a tree-view to show all the polygons,labels and properties in the same list could be useful (e.g. maybe under 'Plate' you'd have nodes for each unassigned poly, then a node for each property 'Ceramic'.. and you drag polygons into the Ceramic node to assign them. Or just show all the polygons in this tree, and allow bulk property assignment through multiple-selection

conversely you could have a mode where you draw text at polygon centroids (labels+properties) - an extension of the visibility icon? allow selecting and assigning property information (use the centroid as a selection-handle but it highlights the outline when you have it selected, to avoid the excessive overlaps of polygon bounding boxes. the 3d tool "Maya" actually had a polygon centre-point selection mode a bit like this). This might be easier to code than making extensive new UI . It might be less intuitive for casual users but fast for experts

bbernhard commented 4 years ago

Using the properties system for that...that's an awesome idea. Really like that!

I guess we could add a hidden property which serves as index to the properties list.

Just thinking how the appropriate UI part would look like. Would you prefer using a drag/drop treeview or would you rather define the tree entry as text label (e.g 0.man.head.left_eye)?

In case we want a real tree view: Where do we place the tree view? Can we integrate the tree view nicely in the unified mode or should we rework the unified mode completely?

dobkeratops commented 4 years ago

Would you prefer using a drag/drop treeview or would you rather define the tree entry as text label (e.g 0.man.head.left_eye)? In case we want a real tree view: Where do we place the tree view?

The only thing I'd say for certain is parsing a naming convention will be a useful fast way to setup new polygons. between the options, it's not clear which would be best (and ease of implementation factors in aswell).

the current instance could be highlighted with the eye button? (or it's other polygons drawn permanently with feint lines?)

an we integrate the tree view nicely in the unified mode or should we rework the unified mode completely

the best would be to extend the unified mode to do it, perhaps it can be generalised from showing polygon by label to showing by property

perhaps a treeview could be placed in the left or right side in a tabview to toggle it with the existing label list or property list; i.e. you extend it to select by label or by property or by individual polygons from the whole list

Maybe it's possible as a modification of the properties panel itself, i.e. if it had a "browse by property" mode where it listed all the image properties , and clicking one highlighted all it's polys regardless of label (that would show the instances clearly)..
.. just like at the moment when you have a label selected, you draw new polys with that label, if you had the properties selectable to the side - you could draw with a "current label + current property" ? (but you'd need to make the difference between this right box selecting properties and assigning properties as it does as the moment). the title ("annotate all:") could help indicate this? ("annotate all: man (instance=0)" "annotate all: plate (material=ceramic)"

still really not sure what the best way to do all this is.. it might take some experiment to figure it out (maybe i can try some ideas in my desktop tool . It's starting out with a hiearchy. there's no 'properties' as such but there's now a nodes connection graph, which could demand "show connected" in the spatial views. And just writing that makes me suggest that you migth be able to implement this entirely as a "connect polygons" tool (multiple select, then click something to bind them all as an instance, and show it visually through color coding or actual inter-polygon links)

r.e. "guessing assignment", in the case where part polygons only intersect the bounding box of one potential parent, the instance information could be assumed. This could be done before training, outside the tool. this would catch a lot of examples i.e. the images with just one person, or a man+woman, or person+dog. some batch tool like that could also find the images where manual assignment is needed

bbernhard commented 4 years ago

The only thing I'd say for certain is parsing a naming convention will be a useful fast way to setup new polygons.

agreed. But I think e would probably drop that pretty fast in favor of a real tree, no? I mean textual input has for sure it's advantages (especially for keyboard warriors), but as annotating an image is a task where the mouse is involved heavily, I am not sure if a keyboard driven input method will boost productivity much?

perhaps a treeview could be placed in the left or right side in a tabview to toggle it with the existing label list or property list;

that's a cool idea. I guess we could also set a cookie to save the user's preference. :)

I haven't tried it yet, but this Javascript library looks promising.

maybe i can try some ideas in my desktop tool

That would be awesome!

Just out of interest: How would your ideal annotation tool look like? Let's assume for a moment now, that there are no technical restrictions nor technical debt and we can do anything we like. Do you have a favorite annotation tool (doesn't matter if desktop or web) or do you feel that there's this one feature that all existing annotation tools are missing?

I occasionally search github for image annotation tools to get some fresh ideas and always wonder if there's still any room for (revolutionary) improvements in this sector or if there's not much left to improve...

dobkeratops commented 4 years ago

Just out of interest: How would your ideal annotation tool look like? Let's assume for a moment now, that there are no technical restrictions nor technical debt

it wouldn't be much different - the main thing is the unified approach which is 90% of the value.

tweaks I might add -

The holy grail would be merging aspects of annotation with photogrammetry and animation rotoscoping , and the combined "image-search"+drawing would be the backbone of that - so this could easily grow out of an annotationg tool.

Do you have a favorite annotation tool (doesn't matter if desktop or web)

not really - I haven't used many, just labelme briefy. Mostly i'm influenced by drawing programs & 3d packages. I think you could annotate well with a layers-based paint program and pen device, but it's hard to get those into the collaborative environment of the web, and their UI's are a lot more complicated.

bbernhard commented 4 years ago

Many thanks for all the suggestions - that really helps a lot with the long term planning of the project :+1:

In the next few days, I'll create a feature branch and start working on the tree integration. Not sure yet how long it takes until I have a first working version, but I hope that by actually working on it that I can better estimate of much work it really is.

I'll try to make it possible to switch between the tree view and the flat label list. That way, we can easily switch to the old representation in case the new implementation has some bugs (which it for sure will have in the beginning) Once the tree view is stable enough we could remove the support for the flat label list and make the tree view the new default for the unified mode.

dobkeratops commented 4 years ago

sounds like a good plan. personally I'm reassured that the database can potentially represent instances with a property. In principle with 2 alternative 'flat' mechanisms (by label or by property), you already have something as powerful as a tree, but you'd still need to enhance the property tools (e.g. an actual way to view all the polygons with a specific material or intance property). Figuring out the best way will just take experimentation.

This discussion started with the hierachical tree idea, but it emerged that Properties give a solution from the data standpoint. you could add an "instance" property and it would solve the original issue within the existing UI & database.

So the question is really what UI will make editing and viewing this data obvious to most users, and easier and fast enough to do in bulk.

Alongside the tree view idea, maybe you can also consider tweaks to property editing (view all by material.. almost a reversal?) , and the idea of a "view all poly centres for selection" mode. but yes having all the polys in a tree, with the properties per polygon all there, would do it I think(and maybe you could offer the user a toggle, "tree root=labels vs properties")

Finally regarding opening up potential, for the use case I had in mind (detailed labelling for pose estimation) - your database already has a lot of useable examples. When no 2 parent polygon bounding boxes overlap, it would be safe to assume the instances. Adding more part labels formally: (arm,leg,neck elbow,knee,shoulder, hips, torso,), and parsing those from naming schemes ("elbow/man", "foot/person", etc) , you'll find a lot of data should become visible (I notice you've got face,hand,foot already). There might be a really simple UI acceleration possible like the ability to set part seperate to main label to reuse the entry of main (currently we have the option of pasting a string thanks to the seperators, but I dont think casual users will figure that out); also being able to translate naming conventions into properties (things like derelict/ /standing etc) will open up existing data & tasks

bbernhard commented 4 years ago

Finally regarding opening up potential, for the use case I had in mind (detailed labelling for pose estimation) - your database already has a lot of useable examples. When no 2 parent polygon bounding boxes overlap, it would be safe to assume the instances. Adding more part labels formally: (arm,leg,neck elbow,knee,shoulder, hips, torso,), and parsing those from naming schemes ("elbow/man", "foot/person", etc) , you'll find a lot of data should become visible (I notice you've got face,hand,foot already). There might be a really simple UI acceleration possible like the ability to set part seperate to main label to reuse the entry of main (currently we have the option of pasting a string thanks to the seperators, but I dont think casual users will figure that out); also being able to translate naming conventions into properties (things like derelict/ /standing etc) will open up existing data & tasks

That's a nice idea, I'll add that to my improvements list :+1:


Yesterday I played a bit with different javascript tree visualization libraries, and I think I'll settle for fancytree for now. It's still actively maintained and overall has a pretty nice feature set.

Here's a small example how it could look like:

Selection_067

The idea is to use the properties system to number the labels according to the position in the tree. So e.g: for the above example it could look like this:

grass:0 dog: 1 mouth:1.0 eye:1.1 ear:1.2 nose:1.3

The . represents that it's a child node.

Those numbers are stored along all the other properties (like the material properties). So we are "abusing" the properties system to store hierarchical information. When the unified mode view gets populated we filter out all those numerical properties and use that information to build up the tree. All the other properties are then displayed as usual in the properties box on the right.

This brings us to the first limitation of that approach: As every property is directly attached to a specific (set of) polygon(s), every label that's added need to have a polygon. Or in other words: If a label is added in the unified mode, it's mandatory to also add a polygon before pressing the "Done" button. Otherwise we can't store the position of the label in the tree. So the polygon serves as "container" to store the label's position in the tree.

Another thing that we need to consider is the actual label name that's then persisted in the database. As the tree information is only used to build the polygon tree, we would end up with the following labels for the above example in the labels few:

Selection_068

One of the "problems" with that is, that it's not possible anymore to e.g query the dataset for "all images that shows a dog's head". The only thing we could do with the above labels is either to query the dataset for head which gets us all sorts of heads or to query the dataset for dog & head, which could also return false positives. (e.g: imagine an image where a human head and a dog's torso is shown).

In order to circumvent that, we could concatenate the parent and the child label. In our example above that would yield to the same labels that we already have now. i.e: nose/dog, mouth/dog, eye/dog, ear/dog.

But I think that only works nice, if we have one parent label and one child label. If the are more hierarchical levels this would translate to something like a/b/c/d...and at that point we would run into the same problems as before ("regex for searching").

My main problem at the moment is, that I want to keep the polygon tree separated from the actual label representation and not mix both. But I still would like to see the actual label in the labels view. (If just head is shown instead of head/dog, a user might again add the label head/dog in the labels view, not knowing that head actually translates to head/dog in the annotations view). I think mixing both could also clash a bit with the labels graph.

dobkeratops commented 4 years ago

Sounds like potential fiddly repurcussions

But I think that only works nice, if we have one parent label and one child label. If the are more hierarchical levels this would translate to something like a/b/c/d...and at that point we would run into the same problems as before ("regex for searching").

every label that's added need to have a polygon.

that would be unfortunate because it's still useful to be able to use labels as search tags, and confirm things are in the scene, but you might not actually want to annotate them - the common label tree is often too fiddly

Maybe you dont need the whole tree path ability, e.g. just instance1 suffices , - because given part names and some additional properties (e.g. left side, right side, front end , back end, interior, exterior) - data-users could still figure out a tree, if they need it. (e.g. label arm/man, instance=1, side=left narrows down a specific arm; users know eye is part of face, is part of head, etc). maybe the tree could work by sorting specific permutations of properties (so in that case, you'd see a tree path man.1.arm.left m man/1.arm.right, etc) You'd be able to choose which properties to sort first, and change it retroactively.

left and right properties seem like an obvious choice, maybe you could even seperate the upper and lower arm, upper and lower leg in the same way "man.1.arn.left.upper_limb" = "label:man.has=arm, properties: side=left,limb_part=upper_limb, instance=1"

would that give a balance of capabilities - flexibility, ease of retrofit , searches ?

ideas

// database:
man.has=arm
 instance=instance1, side=left,
 instance=instance2,
man.has=head
 instance=instance1
etc

// Tree for example rule: "create tree nodes for every property permuattion per label and, list parts last"
man
    instance1       // closer and more detailed example
        left
            arm
            leg
            hand //eg label:man.has=hand,props: instance=instance1 side=left
            shoulder
            elbow
        right
            arm
            leg
            hand
            shoulder
            elbow
        head
        face
        eye
        nose
        mouth

    instance2       // further and less detailed example
        head
        hand
    other_instances // all other polygons not assigned to instances yet
        head
            poly1 // user can drag this into an instance
            poly2
        hand
            poly3
            poly4

// Tree for example rule: sort main label=>instance=>part_label=>other_properties
// finally a tree node per polygon (select anything through the tree)

man
    instance1       // closer and more detailed
        arm
            left
            right
        hand
            left //eg label:man.has=hand,props: instance=instance1 side=left
            right
        shoulder
            left
            right
        elbow
            left
            right
        head
        face
        eye
        nose
        mouth

    instance2       // further and less detailed
        head
        hand
                // all other polygons not assigned to instances yet
    etc

plate
    ceramic
    paper

bowl
    ceramic
    glass 

// ordering could be swapped based on experiment or even user prefernce, only the label,part,instance,property actually appear int he database. Someone training a material recognition system might want to list materials first

perhaps it would be confusing to show a tree that isn't really a tree, but you could use icons to distinguish node types (label, intance, property,part)

you could leave "full tree" as a future idea, if you get as far as this working well?

bbernhard commented 4 years ago

Many thanks for sharing your ideas. It's really refreshing to hear someone else's thoughts on a specific topic - that sort of brainstorming is always extremely helpful to me :)

On a related note: Do you think that we will end up with a deep tree or will the tree mostly be flat (i.e only 2-3 hierarchy levels)?

What's e.g still a bit vague to me is when to use the label graph and when to model something in the polygon tree.

e.g: Let's assume we want to annotate a dog.

Now, we could come up with the following (really detailed) polygon tree:

animal
  quadruped
    mammal
      dog
        head
          ear
          nose
          eye
          mouth

Or we could do it like that:

      dog
        head
          ear
          nose
          eye
          mouth

and define the remaining relationships (animal->quadruped->mammal->dog) via the label graph.

What I am a bit afraid of, is that we accidentally render the label graph useless. As we have two trees here (the polygon tree & the label graph), we have to be extremely careful to not mix up the responsibilities of both representations.

For me the label graph always served as some (user defined) top level view. So e.g. a biologist would probably create a label graph that could look like this:

animal
  quadruped
    mammal
      dog
      cat
  biped
    mammal
      kiwi

On the contrary, I (interested in animals, but not a biologist) would probably create this label graph here:

animal
  dog
  cat
  kiwi

If we have the right granularity in the polygon tree, both of us (the biologist and myself) can write our own label graph and use that to query the dataset.

But if we already add too much information to the polygon tree, we lose that possibility. e.g: If we have a polygon tree that looks like this:

animal
  quadruped
    mammal
      dog
        head
          ear
          nose
          eye
          mouth

we already have the structure (animal -> quadruped -> mammal -> dog) "hardcoded". So it's not that easy anymore to change the "view" via the label graph.

Does that make sense?

dobkeratops commented 4 years ago

91369786-0F79-4FD9-ABA6-7CDF57CA4985 The second example is how I see it(simpler tree, “and define the remaining relationships (animal->quadruped->mammal->dog) via the label graph.”). The polygon trees will tend to be shallower, depth=1-3 . The graph .. could be 5+?

you might have 3 potential ways to sort, each view could do a different job (view Labels, view Polygons, view Properties). Eg editing materials, it would make a lot of sense to be able to see all* the polygons marked as wood, regardless of label or instance. It just so happens the database is ordered “label first”. This is actually quite common in graphics systems.. “a group for each texture, then store all the polygons using it”, but systems might have to sort different ways in other situations

So one does not obsolete the other - they do very different jobs.

Also the graph can express multiple routes to the same item. You could search for carnivore , and you’ll get the carivorous mammals, and reptiles. Thered be a path animal->reptile->crocodile, animal->mammal->cat and carnivore->crocodile, carnivore->cat. Training might discover that all carnivores have sharp teeth, claws etc

I have put graph nodes into the images which might cause confusion,but the idea is these are purely graph suggestions : eventually I’m hoping they will be reduced, but in the meantime using “->” as a seperator means you can search for “car” or “sportscar” and you’ll find the images with “car->sportscar”. The intention is that once “sportscar” is a graph node , the “car->” prefix can be stripped out. It is not for th polygon tree. It’s just handy to submit these suggestions within the database. I’ve stuck to using / to blend object name, parts , and potential properties

mockup of tabbed label + tree view

dobkeratops commented 4 years ago

As a short term suggestion, perhaps you could rename properties as “instance-properties”, and just add the instance ids visible in the list . Then the tool is capable of editing and viewing instance grouping (the first goal here). It’s probably well worth looking into an enhanced property editor (something like - view all poly centres, view all properties in the list , and the selected property highlights all its polygons from all labels, with a tool to toggle it. And possibly make properties available as part of the current drawing mode anyway - “annotate all: fence, material=metal”

dobkeratops commented 4 years ago

7FECD49C-E29D-4A36-9F3C-CD77E34301E2 Some ideas.. if you did go the full tree route, imagine if you could make un-named group polygons, but place specific labels under them - a way of describing whole groups (which might be hard to internally seperate) perhaps this could tie back into image descriptions somehow. Imagine if you could attach those descriptions to parts of the scene . “A bunch of people riding bicycles on the road” “a man standing mounted on a bicycle looking back smiling”

bbernhard commented 4 years ago

Many thanks for all the suggestions - very much appreciated!

It's a really tricky one...

I've thought about it all day and while I still believe that it's possible to implement the polygon tree, I think the end result won't be pretty (at least in terms of maintainability). So far, I haven't found a solution that would allow us to implement the polygon tree while still being flexible enough to add all the other features & suggestions you mentioned afterwards. It becomes more and more obvious to me that the current database schema isn't designed in a way to support that. So, if we add the polygon tree it would probably be through some sort of "hack"/"abuse" of an existing feature...which has the big potential to backfire big time at a later point. So far, every solution I've played through either ended up breaking some existing features (e.g discoverability of labels) or was so hacky that it made it impossible to add other features in the future.

I've looked a bit at labelme's implementation and I think they were facing the same problems back then. I believe in the beginning they really tried to keep the data structured and accessible. But the more features they added, the more difficult it became for them to keep the data structured (no misspelled labels, "no garbage labels", label hierarchy, etc.) and accessible (via search). And I think I understand now also why. Ticking all those boxes is a lot of work and requires a really well thought through design...

I am not yet giving up though...still hoping to find a solution that integrates a bit better into the existing database schema and is less destructive.

dobkeratops commented 4 years ago

right it does look like a big upheaval. Perhaps working on new tools to manage properties is the way to go. It seems an instance property is enough to group the connected limbs of a person, so you could just make just extend the tools to view and assign properties. That will have other uses anyway: it would be great if you could view all the polys (from all labels) by material (I imagine a new version of the properties list that works in parallel to the label list, list all the properties, and click there to show everything that uses it, and apply the selected property to all new polygons that you draw?)

dobkeratops commented 4 years ago

Some more experiments with human part annotations - tried using the ellipse tool, this could actually be faster (perhaps with a hotkey for rotation such that you could rotate in the drawing mode),

imagine if you could quickly toggle between person-part labels from a palette or the existing label list, (using the [,] hotkeys)

These might actually be better than polygons for pose-estimation - they sort of imply a centre,axis and range better. It might be possible to reduce drawing a limb to 2 drags (draw a line down the axis, then set the width)- or perhaps you could connect circles drawn around the joints(that would imply the label. "upper-arm" = "an ellipse connecting shoulder,elbow")

rotation hotkeys could speed this up - perhaps "," "." (e.g. 9-degree degree increments.. 5 taps=45deg) -rotate the last drawn object, or maybe rotate the actual image , and you always draw screen-aligned

Perhaps insteead of a polygon tree, there are other ways you can investigate to assist annotating parts of people. imagine if you could draw circles at the joints ,then connect them (a dedicated limb-annotation)

examples:- person_parts4 man_part3 annotate_ellipses

dobkeratops commented 4 years ago

suggestion for drawing joint-connections automatically - based on a joint naming scheme. the idea would be to enable the user to set a connectivity scheme specifying polygon names; to use this every label must be used only once (and disambiguated with blends/properties for multiple instances) eventually connectivity schemes could be built into the system

joints_connectivity_suggestion

bbernhard commented 4 years ago

man, I really love your mockups - they look awesome!

Originally, I had something like this in mind (however I am not sure if this is a good workflow):

What's happening internally is, that we assign a alphanumerical property to each of those polygons indicating that those limbs are connected together. e.g: If there are two men in the picture and we want to connect the limbs shoulder/man, elbow/man, wrist/man together, we would end with those properties:

instance #1:

shoulder/man: joint-0.0 elbow/man: joint-0.1 wrist/man: joint-0.2

Instance #2:

shoulder/man: joint-1.0 elbow/man: joint-1.1 wrist/man: joint-1.2

So by parsing the properties, we know that the limbs are connected like this: shoulder/man -> elbow/man -> wrist/man.

The only tricky thing is probably to select the correct limbs in an image that has a lot of polygons. (I guess that could probably be a bit cumbersome?). The advantage I would see is, that we could use that on the existing dataset. So if there are already annotated limbs we could easily connect them.

But of course, we could also give your joint connection naming scheme a shot. The only challenge I see with that is to create a UI workflow that integrates nicely into the unified mode. Because if I understood you correctly then we wouldn't use the labels list on the left anymore to switch between the labels, but instead use the joint naming scheme to iterate over the limbs, right? I guess that could require a bit of UI tweaking to make that work (the most challenging part is probably to make it clear which UI elements are active and how the user can switch between modes (the "joint mode" and the "normal annotation mode"). I think without a clear UI that could become quite complex.

dobkeratops commented 4 years ago

if you want to connect those limbs you toggle the "show all annotations" button and connect the limbs with the "joint tool"

This sounds quite interesting

There's many ways to approach it.. a case of finding the best balance between useability and ease of bug-free implementation , without massive upheaval to the existing system. It does seem the properties system gives you a lot of options

Because if I understood you correctly then we wouldn't use the labels list on the left anymore to switch between the labels

what could happen is when you start with one extremity, the connection information could drive generating the next label. However there's a downside that you might not always want every label (you skip some because they're occluded or offscreen). so you'd still need the labels UI to be active. it would just behave like it has a limited autopilot

Something far easier to implement is just a straightforward hotkey to toggle , so you could have pasted a large preformated label list (this is working fine thanks to the seperator parsing)

The simplest implementation is that there's no seperate tool at all; it just uses an optional connectivity list to draw the lines, and "[" "]" label toggle keys would be universally useful shortcuts that accelerate this and all other annotation

There's a few ideas for an 'elipse tool ++' . the first is to bolt on a rotation assist (rotate hotkeys or alternate between the first drag drawing the ellipse whilst the second drag orients it). You might want to make this start drawing at the centre, i.e. it's easier to judge placing the ellipse on the objects centre, then rotate around that (contrast to the rectangle bounding box tool). he other is to flip the process so you draw it's "major axis" first, then set it's width (minor axis). You could start with an assumed aspect ratio e.g. 0.5, then the "," "." hotkeys scale it *sqrt(2) 1/sqrt(2) respectively (2 taps doubles or halves the width), or have a 2 state mouse drag tool.

bbernhard commented 4 years ago

There's many ways to approach it.. a case of finding the best balance between useability and ease of bug-free implementation , without massive upheaval to the existing system. It does seem the properties system gives you a lot of options

yeah, I think so too. :)

what could happen is when you start with one extremity, the connection information could drive generating the next label. However there's a downside that you might not always want every label (you skip some because they're occluded or offscreen). so you'd still need the labels UI to be active. it would just behave like it has a limited autopilot

that's what I am a bit afraid of. That we end up with a UI that's actually really powerful, but behaves in a way that's not obvious to most users. Personally, I always felt the most productive in any application if I recognized similar patterns. If I had to use an UI were some options were not obvious to me (or felt contradicting) I always lost a bit of flow.

I am wondering if we can use the existing workflow (with the labels list on the left for selecting the label) together with a "joint tool" and add a bunch of hotkeys (switching between labels, rotating a polygon etc) on top of that to speed up certain things? Maybe we can also add a right mouse context menu with additional options? That way we don't have to introduce a lot of new concepts and maybe(?) get something powerful as well?

dobkeratops commented 4 years ago

Some simpler tweak suggestions to enhance drawing ellipse-bounded annotations. ellipse_tool

"joint tool" and add a bunch of hotkeys (switching between labels, rotating a polygon etc) on top of that to speed up certain things?

yes i thinkso. this diagram doesn't cover a 'connect-joints tool', there's 2 ways you could work. explicitely drawing the limbs - then you've mostly eradicated the need to draw a bounding shape - you can get a pretty good aproximation this way. or drawing the joints, then a draw-connection tool (which could add the green lines in the above mockup),.. or just getting those from common naming conventions (shoulder->elbow etc).

dobkeratops commented 4 years ago

( there is another way you could approach annotating limbed creatures: instantiating a prototype, with all the connections already setup, and then moving each joint into place. some 3d packages have a base humanoid skeleton to work from like that. It might be less obvious how to use that though , and it would be harder to deal with for the examples where you can't see the entire figure due to screen edges and occlusion)

bbernhard commented 4 years ago

( there is another way you could approach annotating limbed creatures: instantiating a prototype, with all the connections already setup, and then moving each joint into place. some 3d packages have a base humanoid skeleton to work from like that. It might be less obvious how to use that though , and it would be harder to deal with for the examples where you can't see the entire figure due to screen edges and occlusion)

right, that would definitely be the icing on the cake. But as this is something completely different UI wise, I am having a hard time seeing how that one integrates nicely into the unified mode. I think it would be way nicer to have a separate view for that (or at least add a switch to the unified mode to switch between modes).

If there are no objections from your side however, I would prefer to start with the most simple version. The plan would be to create a prototype which uses the properties system in the background for storing the joints. In a first iteration I would like to use as many existing functionalities as possible. That way it should (hopefully) be possible to crank out a first version pretty fast. If it turns out that the properties system is the way to go here, we can add more functionality (hotkeys, ellipse tool, etc) to make it more convenient to use.

dobkeratops commented 4 years ago

I would prefer to start with the most simple version

absolutely, thats the best plan. as it stands now, the [,] hotkeys would be the low-hanging fruit to accelerate this and other tasks (we can already paste a label list, and this would help with images with existing label list). I'm hoping the naming conventions can be parsed into properties (left/... etc)

bbernhard commented 3 years ago

Ok, I've tried to create a "mockup" (yeah I know, it's a bit exaggerated to call that a mockup :sweat_smile:) of what I have in mind.

joint1

The advantage of first drawing the limbs and then connecting them together is, that we can use the properties system in the background for representing the joints in the database. That way we can leave the backend as it is (except for a few small changes) and just need to focus on the frontend.

What do you think? Do you think that's too complicated or do you think that could work?

dobkeratops commented 3 years ago

right so this will allow connecting between specific instances, in the case where you have multiple visible (scenes with more than one person); that'll be great. Something else orthogonal i've been doing in labelling is left/shoulder/person,left/elbow/person, right/shoulder/person, right/elbow/person.. In the case where there is just one person, that should also suffice. I'm hoping that with the "/" seperator, that can be parsed into "side=left, person.has=shoulder" etc to fit in with the schema. It might be possible to retroactively assign a left/right property and let that flow across connections in the joints view

bbernhard commented 3 years ago

yeah, right.

I also thought about your idea with the "joint naming scheme" again and while I think it will work great for images with single instances (e.g: single person), I think it will become quite difficult in a big scene with multiple instances. If we have an image with dozens of elbow/person, wrist/person and shoulder/person polygons and the joint naming scheme shoulder/person -> elbow/person -> wrist/person I think it will be difficult to automatically find the correct polygon instances to connect.

Maybe we could try to "guess" the right polygon instances by looking at the x/y coordinates (with the assumption that each elbow/person polygon will be connected to the nearest wrist/person polygon), but I think that will be pretty error prone (especially for scenes with a lot of instances; e.g. crowds of people)

Something else orthogonal i've been doing in labelling is left/shoulder/person,left/elbow/person, right/shoulder/person, right/elbow/person.. In the case where there is just one person, that should also suffice. I'm hoping that with the "/" seperator, that can be parsed into "side=left, person.has=shoulder" etc to fit in with the schema. It might be possible to retroactively assign a left/right property and let that flow across connections in the joints view.

I think that should be doable. Once we have a proper system in place we could write a bunch of migration scripts that migrates the existing data to our new representation :)

dobkeratops commented 3 years ago

Maybe we could try to "guess" the right polygon instances by looking at the x/y coordinates (with the assumption that each elbow/person polygon will be connected to the nearest wrist/person polygon), but I think that will be pretty error prone

yes it will be impossible in crowds.

There's quite a lot of images of one man + one woman, these will work ok by virtue of each peice being labelled /man /woman; and there's a few images with 2-3 people, where a bounding box + man vs woman label will seperate them. But you're right - in the general case you can assume nothing.

bbernhard commented 3 years ago

I guess what we could probably also consider is to add "shortcuts" (one of those shortcuts could e.g. be the "joint naming scheme") on top of the generalized joints mode in case it is to cumbersome to use for the simple cases. But I think it might be better to start with the generalized solution first and then maybe add some simplifications on top of that if needed. :)

dobkeratops commented 3 years ago

ah one little point about naming. I think it's fine to call this "joints" view, "joints" tool ,but lets think about the name of the connections..

there's several words involved here:-

in 3d packages the term 'joint' is sometimes used for the whole coordinate frame, and you do indeed weight vertices to it - but physically a joint is more like the centre of rotation, and the place where 2 parts 'join'.

*joint-connections or limb-parts

*other:

for quadruped animals

foreleg, -> upper_foreleg, lower_foreleg hindleg -> upper_hindleg, lower_hindleg, .. not sure how to label the lower parts of a dogs leg

The diagram & mockup itself is fine. I agree with the idea of drawing those parts first, then connecting them. (it's usually easier to locate these points and throw a circle onto them)

Pose-estimation sometimes shows connections between the shoulders and hip, connecting one whole skeleton together, but we need to be careful about the possibilities between these things i.e the neck,spine,torso - showing some additional articulation , twist etc. (i've done some examples with an "upper_torso, lower_torso". animators often split the spine into 3 bones to give enough control)

this is one good reference image i managed to find in your dataset: human_part_annotation

dobkeratops commented 3 years ago

The advantage of first drawing the limbs and then connecting them together is, that we can use the properties system in the background for representing the joints in the database. That way we can leave the backend as it is (except for a few small changes) and just need to focus on the frontend.

additional suggestion ... would it be feasible to add a "left/right" property selector at the same time as selecting the joint connections (a current state) side=/left/right ; then when you click several joints to connect them together, at the same time you could assign this left/right property to the objects. (what would be perfect is some kind of icon for it, to give a visual indication - but this would take some icon-art.. now that i mention it an icon based part selector could work really well). left & right is easy to get wrong (e.g. you must stop to think to distinguish which is your left as you're facing the screen vs the persons' left and the left of the image), but if you default it to "unspecified" at least people have to think to use it.

bbernhard commented 3 years ago

Here's a short gif of how the whole thing could look like:

Screencast-2020-07-07T21:02:30

Instead of using the buttons (which is probably a bit tedious) we could add a bunch of hotkeys. e.g: "c" to begin a new "joint connection path" and "enter" to end it. With two additional hotkeys ("l"/"r") we could specify the side (left/right).

I hope that's going into the right direction?

dobkeratops commented 3 years ago

Instead of using the buttons (which is probably a bit tedious) we could add a bunch of hotkeys. e.g: "c" to begin a new "joint connection path" and "enter" to end it. With two additional hotkeys ("l"/"r") we could specify the side (left/right).

hotkeys are always good. one more option to consider here is dragging, (like rubber-band line draw) that can be very intuitive but might not be so nice with a touchscreen. you might be able to reduce it to one hotkey e.g. "pressing C again will close the current path and begin a new one, and just exit this mode to stop altogether". you could also just require clicking twice on the elbow, e.g. 1 click always starts then the next click always closes a joint (and you have to click again to start a new one). with chains of just 2 it might be ok. Yet another option is clicking twice on the endpoint 'cancels'. end on the same joint = do nothing and stop continuing, until you click another one to start.

There's alot of options here.. hard to say which will be quickest to use (and quickest to learn) without experiment, I'm not sure which my own favourite is here.

bbernhard commented 3 years ago

Many thanks for all the suggestions - very much appreciated! The rubber-band suggestion sounds really interesting, I think I'll give that one a go with the next iteration. But I agree with you, at the current state it's hard to say which of those options is the best. I think that will become more clear once there is a usable prototype out there. I'll keep updating this thread here with some gifs and sceenshot as the whole thing grows. Hopefully that gives a bit of an early impression how the first version will look like in the end :)