Open dobkeratops opened 4 years ago
Many thanks for the suggestion, I'll have a look. I think it shouldn't be that hard to implement :)
(At the moment I am working on a data migration script. I recently noticed that there are some duplicate entries in the label_suggestion
table (due to a missing unique constraint in the db). Although the duplicate labels do not have any operational impact, I would like to get rid off them as soon as possible. The actual migration script is pretty easy, but as I am operating on production data I have to be extremely careful to not accidentally delete data while cleaning up the duplicate ones. So I am adding tons of sanity checks to make sure that everything goes smoothly. I am hoping to get that done at latest by the end of next week. Then I'll have a look whether I can improve the search a bit :))
Regarding meta labels (food, animal, etc).. whilst they’re not ideal as labels, sometimes it might be better than nothing eg in cases where the precise type isn’t known, you can still narrow it down from anything.
So when I say discourage, I hope you could still allow them in the free labelling mode, and they’ll be available for training (making nets train on the broad categories could help you get features from the whole dataset.. like in your dog outline example, if you pulled in all the quadruped_mammal types it would have more to go on)
Maybe in your label enabling process (trending..) you could count the number of annotations given? -> enabling ones which have the most first would best expand the available training data ?
So when I say discourage, I hope you could still allow them in the free labelling mode, and they’ll be available for training (making nets train on the broad categories could help you get features from the whole dataset.. like in your dog outline example, if you pulled in all the quadruped_mammal types it would have more to go on)
Totally agreed!
I think broader labels are especially useful when one doesn't know how to describe the actual object better. The only "problem" I see is, that this potentially could result in a lot of duplicated work. e.g many different labels (animal
, mammal
, dog
, etc) could have the same underlying polygon. So with our current workflow, one has to draw the same polygon for every label.
But maybe we can use copy/paste for that? (e.g copy the polygon that was drawn for the animal
label and paste it into the dog
label). Or even introduce a "link" feature? e.g one could link the animal
label and the dog
label together. So that whenever each one of the polygons will be updated, the other one will be too.
Duplicating (copy/paste) seems like reasonable way to solve the problem of relabelling without work destruction- you keep the original polygon. They’re still consistent . In the case of disagreement , the two polygons become like votes for the pixels (validation for either then increases or decreases the votes)
Regarding a “link“ feature, would a tree structure (like labelme) handle it? I’ve avoided making this suggestion here because it looks like quite complex UI to me (harder to get people to use it right, and more code to maintain). What’s good about imagemonkey is the integration of several aspects.
I’ve been adding a small editor to my current c++ renderer sourbecase , not sure how far I will take it but I’m trying out the GTK widgets including it’s tree view (gtk is a bit clunky though and I don’t think I can get that into a web page) . One idea I have in mind is that hierarchical annotations could be used for rotoscope, eg explicitly place a skeletal model over a reference image, but it would be hard to make this friendly and stable enough for public consumption. I guess Facebook must have done something like this for their impressive human remapping demo. There’s all the pro 3D packages and Blender out there with very comprehensive feature sets (including support for rotoscoping)!but they can be intimidating to use. If I could get something like that working locally I could still share some data (ie convert into plain polygons, named “upper_left_arm” etc). As I also have some painted bitmap annotations lying around I was also meaning to ask about options there
Important note - We also can’t currently search for labels with underscores - I like using these where possible because it gives better grouping hints (especially with label blends foo/bar_baz
is clearer than foo/bar baz
. The latter could be “a blend of foo and bar baz” or a “Baz with a prefix blend of foo and bar“. Some spaces could convert into either / or _ (I think we’ll need a manual list of aliases to be sure about converting them)
In the case of disagreement , the two polygons become like votes for the pixels (validation for either then increases or decreases the votes)
good idea :+1:
Important note - We also can’t currently search for labels with underscores - I like using these where possible because it gives better grouping hints (especially with label blends foo/barbaz is clearer than foo/bar baz . The latter could be “a blend of foo and bar baz” or a “Baz with a prefix blend of foo and bar“. Some spaces could convert into either / or (I think we’ll need a manual list of aliases to be sure about converting them)
Thanks, I'll add the underscore to the parser generator :)
Regarding a “link“ feature, would a tree structure (like labelme) handle it?
I have to admit that I do not know labelme's tree structure that well. Is it "just" a hierarchical representation or is it even more powerful? I guess the question then is also: What should we do with the label graph? Can the label graph and the tree view coexist or will it become obsolete?
Personally I kind of like the idea of keeping the labels and the actual hierarchical representation separated as much as possible. (i.e mostly sticking to "flat" labels, with at max. one hierarchy level e.g: window/car
) I think this could later give us the flexibility to represent different "views" with the label graph. e.g a biologist might be interested in a different label graph than an archeologist.
I’ve been adding a small editor to my current c++ renderer sourbecase , not sure how far I will take it but I’m trying out the GTK widgets including it’s tree view (gtk is a bit clunky though and I don’t think I can get that into a web page) . One idea I have in mind is that hierarchical annotations could be used for rotoscope, eg explicitly place a skeletal model over a reference image, but it would be hard to make this friendly and stable enough for public consumption. I guess Facebook must have done something like this for their impressive human remapping demo. There’s all the pro 3D packages and Blender out there with very comprehensive feature sets (including support for rotoscoping)!but they can be intimidating to use. If I could get something like that working locally I could still share some data (ie convert into plain polygons, named “upper_left_arm” etc).
WOW, that sounds really interesting. I don't know if it's possible with GTK, but maybe you could compile it to WebAssembly with emscripten? Qt recently also added support for compiling applications to WebAssembly. Qt's WebAssembly support is still in it's early stages, but I really hope that they will stabilize it and fix most of the bugs for Qt 6.0. Writing an application in C++ and then compiling it to WebAssembly would be so awesome.
Right I think Qt is more popular so it’s support is ahead . If I keep my UI toolkit use to one side I might be able to switch frameworks . I could try ImGui aswell, everyone raves about it and it’ll run anywhere that gl does. I can be a bit stubborn switching ..( i might have gone for gtk because the c bindings are useable in rust aswell?)
From the reply about the link feature - did I understand right that you meant linking the polygons, or labels?
I don’t actually know how labelme organises it’s labels - it starts with free labelling but does seem to have a label database aswell because it tells you if it knew it, and makes suggestions . They might have something like the properties system.
Regarding the polygons, labelme arranges them in a tree. When you draw a new polygon you can specify if it’s a child of the previous (if I remember right) but you can also just drag them around in the tree view to retroactively organise them.
So what id imagine is graph organised labels , and tree organised polygons.
A polygon tree could map directly to the “node trees” common in 3D graphics .. eg
What might look weird there is saying “the Lower arm is a component of the upper arm” but it makes sense as a motion heirarchy.. a rotation of the upper arm propogates down to all its children. 3D artists are skilled people.. explaining all this to a casual user and getting them to use it correctly might be too far fetched.
But maybe hierarchies could be set up as predefined templates (biped person, car with 4 wheels, quadruped animal with Head,4 legs and tail) , the user wouldn’t have to build the tree, just select the parts and move them into the right places onscreen?
Right I think Qt is more popular so it’s support is ahead . If I keep my UI toolkit use to one side I might be able to switch frameworks . I could try ImGui aswell, everyone raves about it and it’ll run anywhere that gl does. I can be a bit stubborn switching ..( i might have gone for gtk because the c bindings are useable in rust aswell?)
I don't know much about GTK's current state, but I used it once for a freelancing project back in 2013. At that time GTK was in a horrible condition (at least in my opinion), compared to Qt. I used it for about two months, before I completely rewrote the whole thing in Qt. Although the rewrite took quite some time, I never regretted it (I moved along way faster after the rewrite). There also seems to be some Rust bindings for Qt...but I don't know how good those are.
From the reply about the link feature - did I understand right that you meant linking the polygons, or labels?
yes, exactly. I think both should be possible, but I think linking labels should be sufficient in most cases. I would do it on a "per image basis". E.g Imagine this image here
For that image we could add the labels animal
, mammal
and dog
. In the unified mode, we could then add an option which makes it possible to link all those labels together (maybe we can use a "chain icon" to represent that visually) The only thing that the linking does is, that it "mirrors" the polygons. So, if you add a polygon to the label dog
, it will automatically also added to the label animal
and mammal
. If you update a polygon, all mirrored occurrences will be updated too.
But you have to be careful which labels you link together. So e.g if we have this image of a cat and a dog it only makes sense to link animal
and mammal
together, but not the dog
.
So what id imagine is graph organised labels , and tree organised polygons.
aaah, got it. Never thought about it that way, but that could make sense. I'll have to think about it a bit more, if/how we could integrate that (not sure yet if the current database schema would support that).
There’s a big difference in that imagemonkey currently has a label owning all its polygons (Perhaps you could say “the label is a name for a group of polygons”) , whilst in labelme all the polygons are seperate and have their own label; you get the label entry dialog after drawing every polygon. it might be hard to retrofit .
Personally , I dint think duplicate polygons to overlay new information into a pixel is a problem. But perhaps the label linking idea would help avoid setting duplicate tasks .
WIP - adapting this editor i'm writing at the minute to image annotation. the primitives are arranged in a hierarchical scene (e.g. here there's a parent 'car' polygon, and child 'wheels', 'windows'; you could then add a seperate 'windows' under 'buildings' ,and so on).
you might remember that earlier attempt i did in JS .. it's similar to that with mousewheel zooming etc but this is being written more as a 3d tool from the outset. (it's really intended as a level editor tool; drawing over reference maps has similarities to annotation).
Not sure if I'll get this to a stage where it's "fit for public consumption".. it's harder to get something like this stable and harder to learn to use, and people who do want to learn something like this usually will go straight to blender. But I would like to port this to emscripten/webassembly eventually.
I'll keep going with these various experiments I have in mind. there might be some ideas you can copy later, and I might still be able to throw data from this at your server?
WOW, that looks AWESOME - really hope that you continue working on it. Looks great!
Just out of interest: Does the code live on github? :)
and I might still be able to throw data from this at your server?
yeah, sure. The APIs are already there - some of those probably need a bit polishing, as they have "grown" over the years, but apart from that they should be usable. As ImageMonkey is completely dockerized, it should also be easily possible to run an instance locally for testing. :)
(Today I've fixed the duplicate labels in the database. Unfortunately, the issue was was more complicated than anticipated. My initial assumption was, that the duplicated values were due to a missing unique constraint in the database. But as it turned out, the unique constraint was already there, but it wasn't enforced by PostgreSQL. The reason for that is glibc 2.28. When I've moved ImageMonkey from the digitalocean cloud to the Hetzner server, I've also upgraded from Debian Stretch to Debian Buster. Debian Buster uses glibc 2.28 which introduced some locale changes:
The localization data for ISO 14651 is updated to match the 2016 Edition 4 release of the standard, this matches data provided by Unicode 9.0.0. This update introduces significant improvements to the collation of Unicode characters. […] With the update many locales have been updated to take advantage of the new collation information. The new collation information has increased the size of the compiled locale archive or binary locales. (copied from the glibc release notes)
The consequence of that change is, that one must rebuild all indexes immediately after upgrading to glibc2.28, or otherwise indexes could get corrupted. After finally tracking down the issue (many thanks to the folks at the PostgreSQL mailing list), I removed the duplicate entries and rebuilt all the indexes. As I had to "re-order + re-group" the entries in the database in order to remove the duplicate ones, the activity chart on the front page might look a bit different now on some days. But there shouldn't be any data loss due to this - except for the deleted duplicate ones of course )
Just out of interest: Does the code live on github? :)
it's growing out of the earlier renderer I showed you, the plan is to have that in a viewport. I put that on gitlab earlier this year as part of another potential collaboration (someone wanted to build a game to host & test other types of behavioural AI). I might move it to github eventually . I haven't updated this speciclally to the gitlab repo for the past couple of months.. I probably want a cleanup before I do. Eventually I'd like this to handle the earlier experiments i mentioned aswell (training on procedural graphics).
this was the gitlab repo last updated 2 months ago - it doesn't have the editor yet https://gitlab.com/glcoder0/gl-demo
Cool, many thanks for sharing! Please let me know when you update the repo and/or move it to github. Would love to follow your progress :)
regarding the slash/underline in the search: Both characters should now be allowed in the search :)
edit: oh, I completely forgot the graph arrow. Sorry for that - I'll add that one next.
that works, (eg searches for man/sitting, woman/standing etc). there's some pending tasks like that. I see that works with | (e.g. a search for man/sitting|woman/sitting|person|sitting works) - that's quite handy aswell. eventually some sort of partial match or wildcard search would let us find all the blends by a search for a component
bit more progress on the tool - the renderer works in the editor viewports, and there's a thumbnail browser. imagine having a palette of labelled lowpoly models, and annotating by dropping the nearest approximation onto a reference image, rotated & scaled into place to specify orientation,size/distance
eventually some sort of partial match or wildcard search would let us find all the blends by a search for a component
agreed, that would be awesome. But as most of the "regex characters" (e.g *
) are already in use, I guess that will be a bit tricky.
bit more progress on the tool - the renderer works in the editor viewports, and there's a thumbnail browser. imagine having a palette of labelled lowpoly models, and annotating by dropping the nearest approximation onto a reference image, rotated & scaled into place to specify orientation,size/distance
looks totally awesome!
I have to admit that I am fascinated with 3d graphics + 3d tools since my teenage years. I really admire those people that can wrap their head around all the math that's required to create renderers and 3d modelling tools. For me 3d graphics was (and still is) a black box. I mean, I have a fairly okayish spatial perception, so with the right tools I can do simple things, but I completely fail when it comes to understanding all the math that's involved.
the graph arrow can now also be used in the search :)
that works,awesome.. i'm able to find "car->sports_car", "food->cooked_food", etc.
Regarding the other ideas - partial matches - how about a wildcard option. might that be any easier and more generally useful to people?
The database contains some exaples like"person sitting
, sitting person
aswell as person/sitting
(better).
Wildcard searches would be a catch all, on the other hand retroactively fitting everything (by corrective renaming scripts?) to the easily parsed "/" and "->" seperators might be more useful in the longrun (keep the data in a more easily useable form)
What if wildcard searches were the default (e.g. if someone searches for "car", there's no need to exclude car->racing_car, parked/car, vintage car etc; better to show them the variety (and encourage precise labelling) then they can narrow further)
how it could work - "car" becomes "car" (for simple matching), or perhaps ".\bcar\b." (for regex)
I have to admit that I am fascinated with 3d graphics + 3d tools since my teenage years. I really admire those people that can wrap their head around all the math that's required to create renderers and 3d modelling tools. For me 3d graphics was (and still is) a black box. I mean, I have a fairly okayish spatial perception, so with the right tools I can do simple things, but I completely fail when it comes to understanding all the math that's involved.
if you could find a way to relate specific polygons & vertices between images within imagemonkey, you might be able to do some forms of 3d modelling within it's database .. i.e. the machinelearning backprop process could figure out 3d coordinates and camera views given enough label-guided correlations between images (exactly like photogrametry tools) - and tracing outlines of people and marking the joints & extremities (knee, hands feet etc) might even be enough to figure out animation frames.
just trying to think of ways that could go. Perhaps the collections feature is enough to handle 'a bunch of views of
There's definitely applications of machine learning where the internal informaton must be 3d , e.g they've sucessfully trained nets to render different views of objects; and the best vision systems would need some internal 3d intuition anyway. lots of possibilities here. From what I've seen of GANs , i'm convinced there's going to be a way of doing image-driven 3d rendering from something intermediate between video & NN feature maps (kind of like the angles and frames being multiple time-like dimensions in an image-compression scheme)
one little idea I've just tried: creating a node heirarchy from extra seperators, eg.. car{wheel{hub,tire},door{handle},headlight}
creates a tree
car
wheel
hub
tire
door
handle
headlight
in imagemonkey with a flat label list, perhaps you could just keep the parent each time (i.e that would make car, wheel/car, hub/wheel/car, tire/wheel/car, door/car, handle/door/car, headlight/car"
); perhaps seperate instances could also be comgined with a index (0/car 0/wheel/car 1/car 1/headlight/car etc to allow representing trees, in that example "2 seperate cars, the first has a wheel , the second has a headlight")
EDIT: i just realised assuming a/b/c
as a hierachy path would clash with label blending. label blending is probably more useful
how it could work - "car" becomes "car" (for simple matching), or perhaps ".\bcar\b." (for regex)
That's an interesting idea. I'll check in the backend...maybe that's already easily possible.
creating a node heirarchy from extra seperators, eg.. car{wheel{hub,tire},door{handle},headlight} creates a tree
Interesting idea. So, it's basically a textual representation of a tree, that then gets parsed in the backend and is broken up into individual labels, right?
The only thing I imagine a bit hard is to actually keep track of the tree's hierarchy. With all the curly brackets and commas, I guess it could be quite cumbersome to enter a syntactically correct label expression? Maybe we could add some syntax highlighting/auto completion to make that experience a bit more pleasant/less error prone, but I guess at that point we probably could also add a "real" tree representation (with drag & drop) too? Or do you think that the textual input has some advantages over the visual input? (maybe faster?)
right it's the classic expert vs casual user issue .. indeed I think this would be too much for most users to discover and use properly. The current situation is ok, you can still paste in a longer string with all the prefixes applied. I wonder how the idea of connecting polygons would pan out instead. is this easier to explain to casual users than a tree structure? I think most people understand limbs connected by joints, but mapping that to a formal syntax or even UI is harder. Is there a way to add tree support without casual users getting confused or making mistakes (let them bypass it). it could be an "advanced task" to just organise the polys people already made
it could be an "advanced task" to just organise the polys people already made
I think that's probably the best idea.
In my opinion Web UIs are really bad for feature rich user interfaces. With native desktop applications it's way easier to cramp a lot of features/options into a single page (you can heavily use right mouse context menus, make floating menus, use ribbon bars, etc.). But with Web UIs it's way more complicated, as you can't use most features without sacrificing compatibility (e.g: if right mouse context menus are used, people can't use the site with tablets/smartphones; if the site doesn't use a responsive layout - which uses up a lot of precious white space - the site will look like garbage on a smartphone/tablet), if you use some special HTML features you always have to make sure that every major browser renders it correctly, etc,..). So, I think the only real possibility is to split up features into multiple views and keep the views relatively lean.
(Recently, I was playing a bit with fusion360 and I have to admit that I am really blown away by it's enormous feature set. And the most fascinating thing for me is, that they somehow managed to create a relatively "simple" UI for it - of course not all the stuff is self-explanatory, but once you figured out how something works, the workflow really makes sense.
One UI pattern that I recognized in fusion360 (and also in some other 3D modelling programs) is the separation of features into views. There are certain features that are only available in specific views/workspaces. This reminded me a bit of our separation of the labels view & the annotation view.)
Imagine if the searches (both dataset->explore view and unified->browse) where aware of the General label blending , “raw” part syntax, and Graph arrow formats , Edit: also allow searching using underscores
Eg allow searching
car->sportscar
(Currently it rejects non alphanumeric characters)grass/soil
(General blends of any label pair)partial match:
crane
would find examples likecrane->tower_crane
,crane->gantry_crane
via partial match between seperatorsvehicle
would find examples likevehicle->car
,vehicle->aircraft
person
would find the blend syntax used for states (person/sitting
person/walking
person/ridingBicycle
, etc) and examples likeperson->soldier
,person->spectator
etchead
would findhead/person
,head/tiger
etcfighter_jet
would findvehicle->aircraft->fighter_jet
wooden
would findwooden/fence
,wooden/box
(maybe add a range a material labels accessible to the search)perhaps add some “meta labels” for broad searches and even put some shortcuts to them on the intro age to show the breadth of the dataset: I would suggest:
food
,container
,vehicle
animal
tool
,furniture
room
machine
device
plant
component
structure
You’d want to discourage people from annotating with them individually, but as part of a free label graph node combination they’re a handy starting point . There’s examples like container->box, furniture->bedside_cabinet, vehicle->aircraft->hot_air_balloon, tool->chisel, animal->insect->dragonfly etc. Eventually you could demand these as part of label free suggestions (Eg the UI could guide the syntax, ask for a known label,material or meta label prefix). You could also get somewhere allowing them with material (plastic/container, metal/tool, wooden/structure)?for parsing and partial matches, initially both
->
and/
could just be treated as seperators . It wouldn’t need to understand yet if they’re graph nodes or blends - in both cases it’s valid ; the simpler form in a graph node is implied anyway so its always consistentperhaps you’d need to filter out the display of pending label suggestions, but other users could still benefit from their presence; by adding the meta labels and general part blends you’d open up quite a bit. It would get the ball rolling for broader support . For search purposes again consider opening up a broader part list,
wheel wing lid leg foot
. Perhaps between the existing labels,materials,meta labels and additional components all used in blends, you’d have enough safely visible options to start training on them and encourage the use of the for:Labels with underscores eg
fighter_jet
sports_car
fire_truck
cable_car
battle_tank
- I’ve preferred to use these where possible to eliminate the ambiguity of spaces: underscore combines the words into a single label, whilst / means the words are separable blended labels or properties. A fire truck is not a blend of fire/truck; a sports_car is not related to sport (although it is a “car”) . The spaces exist because it’s easier to type (and there was a time when unified didn’t allow underscores); ideally we could retroactively convert them?(keep an alias list for the most common ones?)