ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

mimic LabelMe format? #40

Open dobkeratops opened 6 years ago

dobkeratops commented 6 years ago

Would it be possible to make this tool compatible with the existing LabelMe database, e.g. export and import XML for the annotations in exactly the same format

Having looked through a little, this seems fairly straightforward.. it's a bit verbose, e.g. <pt><x>120</x><y>64</y></pt> to specify a 2d point .. but that's just XML for you. It does seem to contain a 'scene label' which it calls , e.g. "street urban city outdoor" .. which I couldn't find in the interface.

example snippet enclosed:-

 <annotation><filename>p1010098.jpg</filename><folder>static_boston_street_april</folder>   <source><sourceImage>The MIT-CSAIL database of objects and scenes</sourceImage><sourceAnnotation>The MIT-CSAIL database of objects and scenes</sourceAnnotation></source><scenedescription>street urban city outdoor</scenedescription><object><name>trafficlight</name><deleted>0</deleted><verified>1</verified><date>02-Jan-2005 19:22:19</date><polygon><pt><x>2111</x><y>980</y></pt><pt><x>2117</x><y>1102</y></pt><pt><x>2167</x><y>1103</y></pt><pt><x>2161</x><y>979</y></pt><username>anonymous</username></polygon><viewpoint></viewpoint><id>0</id><parts><hasparts/><ispartof/></parts></object>

If you could import and export these .. you could just build on their database (i.e. just clone it all and run their annotations through your verification mode).

One difference is that you have rotated bounding-boxes, whilst they just have aligned bounding boxes or polygons; perhaps you could convert your rotated bounding-boxes into quads. You could just skip the ability to edit other types of polygons to keep your existing UI (which I think is ok) ... but you could still maintain them. (and perhaps still allow scaling/translating/rotating etc through your interface).

I think your rotated bounding boxes would be enough , in conjunction with their 'hierarchy' idea (objects are a tree of parts) e.g. "car" -> "wheel, bonnet, headlights" , "dog" -> "head, tail, body, legs" etc etc. Another use of 'parts' is you can mark an area containing many things with a plural label, then mark some individual items (e.g. "crowd" or "people", or "trees"/"bushes" which might recede into the distance, but then you can label the individuals at the front individually) (EDIT: they seem to store bounding boxes as polygons flagged as 'bounding box', so you might still be able to pass them back?)

Are there any other complications or diverging ideas that might prevent this?

(I wonder if it would still be possible to add the idea of 'inter-object labels' ; maybe this could be done in some backwardly compatible way such that you could pass on a suggestion to them to add it)

I suppose another idea might be to get them to add a JSON exporter (which would look more sane) but ultimately as it's all machine processed it doesn't matter

I was curious myself to write something to browse their database (e.g present snippets sorted by label) etc.

I also wonder if the 'parts' idea could be used to prime vision nets that tend to have neurons in their mid layers for specific object recognisers (training in layers? make the middle layers recognise components?)

bbernhard commented 6 years ago

Many thanks for your suggestions - really appreciated!

I haven't looked at the format of the LabelMe output in detail, but I think writing a parser shouldn't be that hard. My only concern is, that I am not sure if that's legally ok to do so and if that gives us a bad reputation.

I haven't found anything on the LabelMe website which would prohibit that, but from other online services (e.q Unsplash) I know, that some of them explicitly state, that one is not allowed to use their API to scrape images with the objective of creating a similar online service. I am also not sure how the community would react if they recognize that we collected (some would probably say 'have stolen') data from other services to host it on our service. I am afraid that some people would see ImageMonkey as "just another service that collects data" while it is/should be more than that.

But I am totally with you: We need way more data to get some serious traction and to make people invest time in this service. I really hope that the focus on gamification and the fact that the dataset is totally open pays off in the long run (and maybe gets us some press coverage). In my opinion (I might be totally wrong on this) other services are really good when it comes to sheer number of data, but what they are lacking is to attract people outside of the niche. If we can find a way to attract those people and together with some clever "pre-processing" (using images from Wikimedia, pre-annotating images..), we could position ourselves in a different niche (and maybe attract other user groups). What I want to avoid is, that people see ImageMonkey just as a LabelMe clone (with way less data). As LabelMe is also OpenSource (and already has a significant amout of data) I could see people conttributing to LabelMe's sourcecode first, as it has a way greater impact. So competing against a "big player" that way is probably a fight we can't win.

What do you think? Does that make sense or am I too idealistic? Would really like to hear your opinion on that one :)

dobkeratops commented 6 years ago

hmm ok ; if scraping goes against the spirit of their tool, maybe that's a problem,

I suppose you could even just make an export in the same format, so any users/researchers who already use 'LabelMe' get to integrate your data for no extra effort.. just make it available and see if it inspires them to offer support.

you could contact them and see if they are amenable to making alternate UI that contributes to their database , and just happens to mirror the new additions?

What I want to avoid is, that people see ImageMonkey just as a LabelMe clone (with way less data)

fair enough. I think you could differentiate as discussed with the focus on 'low effort/short attentionspan' mobile users - (verification, expanded as 'label refinement') and/or a 'pro-mode' (a more developed interface than they have.. more hotkeys, more polygon editing methods, preset 'parts' assemblies for common objects like humans and cars, label "palettes" , advanced 'inter-object' labels, link to wikipedia, photogrammetry 3D modeller integration - put 2 images side by side and use one as labels for the other - 'these are 2 views of the same building' etc)

you can see how we have an endless feature request list already :)

As LabelMe is also OpenSource (and already has a significant amout of data) I could see people conttributing to LabelMe's sourcecode first, as it has a way greater impact

indeed, I would also be considering 'do I help your project or theirs'; - given that I've already spent quite a bit of time now doing annotations in their dataset ; I think i've made 1000's of annotations there by now and the activity starts to tire once there's no more depth to learn in their tool . That's why I'm interested in another layer like 'inter-object labels', or advanced pose annotations (i want 'line' annotations instead of just areas so you could mark the bones), perhaps orientation hints.. "annotate the approximate area of contact between objects and surfaces" perhaps for 3d hints.. or the connecting to wikipedia.

It would be really handy to be able to just browse their labels, e.g. a page for 'car' shows crops of all the 'cars', and you can click on any to take you to that image (images <-> labels)

It's always easier to put new ideas into a new project..

The ideal in my head is that there'd be a standard format: "annotated images", and the world produces many tools on many devices to edit it - just like we have many text editors, many paint programs, many 3d modellers .. each with their own quirks and preferences.

I know there's many ways to organise a workflow; no one tool is the 'be all and end all' .. but they can all interoperate through ASCII text, JPG images, dotOBJ models or whatever

dobkeratops commented 6 years ago

(do you use IRC/freenode by the way, for brainstorming in realtime? there's some channels there dealing with AI/machine learning)

dobkeratops commented 6 years ago

people see ImageMonkey just as a LabelMe clone (with way less data)

One question I have about LabelMe is 'how many people still use it'. Going through it I'm seeing indoor scenes are dated by the focus on CRTs in their offices.

I haven't seen a modern office with predominantly flatscreen monitors. In the snippets i've found it seems there's a tonne of labelling their paid for (mechanical turk) approx 10 years ago, and Ive very rarely 'bumped into' other contributors.

I would argue if you make something compatible and with a modern slant, then actually credit them and point people back to it you might re-invigorate their community. synergy rather than displacement/competition (hence any 'legal complaints' about scrapers as you say)

For example if you simply dont include a general purpose polygon editor or hierarchy editor, you just allow placing your presets ("car: headlights, wheels, wing mirrors , license plate"; "person: head, eyes, hands, feet..") - and you tell people ' go to the LabelMe interface to define new hierarchies' ). Then instead of making a competitor, you're just making a refined tool for specific workflows.

bbernhard commented 6 years ago

Thanks for your feedback!

I suppose you could even just make an export in the same format, so any users/researchers who already use 'LabelMe' get to integrate your data for no extra effort.. just make it available and see if it inspires them to offer support.

That's a great idea! Do you think it makes sense to also integrate the possibility to export to other data formats as well. I am thinking about Pascal VOC or even some machine learning framework specific formats (e.q TFRecord). Would you like to see that covered by a library (e.q Python library which operates on the public API, downloads the data and converts it locally into the desired format) or more as a server feature? I think a server feature would definitely be more handy, but could add some load onto the server for converting the dataset into the desired output.

you could contact them and see if they are amenable to making alternate UI that contributes to their database , and just happens to mirror the new additions?

Haven't thought about that, but that's definitely worth a try. If they agree it also wouldn't look like as we are stealing their data :)

fair enough. I think you could differentiate as discussed with the focus on 'low effort/short attentionspan' mobile users - (verification, expanded as 'label refinement') and/or a 'pro-mode' (a more developed interface than they have.. more hotkeys, more polygon editing methods, preset 'parts' assemblies for common objects like humans and cars, label "palettes" , advanced 'inter-object' labels, link to wikipedia, photogrammetry 3D modeller integration - put 2 images side by side and use one as labels for the other - 'these are 2 views of the same building' etc) you can see how we have an endless feature request list already :)

totally agree on that :) btw, as you mentioned polygon editing methods: Do you think it makes sense to integrate other shapes (circle, ellipse) into the annotation tool. I recently thought about that and I think it would already be possible without much implementation effort. Is that something you would use or are other mechanisms (free line drawing tool for sketching a polygon...etc.) more efficient?

indeed, I would also be considering 'do I help your project or theirs'; - given that I've already spent quite a bit of time now doing annotations in their dataset ; I think i've made 1000's of annotations there by now and the activity starts to tire once there's no more depth to learn in their tool . That's why I'm interested in another layer like 'inter-object labels', or advanced pose annotations (i want 'line' annotations instead of just areas so you could mark the bones), perhaps orientation hints.. "annotate the approximate area of contact between objects and surfaces" perhaps for 3d hints.. or the connecting to wikipedia.

also agree on that one. That's definitely something I could see adding in the foreseeable future :)

The ideal in my head is that there'd be a standard format: "annotated images", and the world produces many tools on many devices to edit it - just like we have many text editors, many paint programs, many 3d modellers .. each with their own quirks and preferences.

Couldn't have said that better ;-)

bbernhard commented 6 years ago

(do you use IRC/freenode by the way, for brainstorming in realtime? there's some channels there dealing with AI/machine learning)

not yet, but i'll set something up tomorrow (it's already pretty late here in Austria ;-)). Can you recommend me some good channels?

One question I have about LabelMe is 'how many people still use it'. Going through it I'm seeing indoor scenes are dated by the focus on CRTs in their offices.

That's a good question. A few days ago I wanted to create an account, but due to an server error I couldn't sign up. I have to check whether this error still persists or is already fixed. By looking at there discussions platform (https://www.quicktopic.com/37/H/E4xRZ7fZZhh) it looks like that there are still some users (but not sure if they are actually contributing to the dataset or are just using it).

I would argue if you make something compatible and with a modern slant, then actually credit them and point people back to it you might re-invigorate their community. synergy rather than displacement/competition (hence any 'legal complaints' about scrapers as you say)

haven't thought about it that way ;) It's definitely worth a try to contact them and see if they are open to collaboration :)

dobkeratops commented 6 years ago

Do you think it makes sense to integrate other shapes (circle, ellipse) into the annotation tool. I recently thought about that and I think it would already be possible without much implementation effort

Absolutely. this sounds in the same vein as the 'diagonal cut corner' shapes, you could present those in a palette. I definitely thought 'circle'/ellipse labels would suit some use cases better (e.g. 'shoulder', 'knee','eye')

are other mechanisms (free line drawing tool for sketching a polygon...etc.) more efficient?

there's no magic bullet. There's certainly 'a best starting point' (a best tool to implement and teach people first) .. and then a choice 'what expands on that for the least effort'. There's an endless variety of niche situations. I think those 'alternate shapes' you suggest would flow naturally from what you currently have..

I had this discussion with 3d artists over the years about "what features does a 3d editor need". The basic tools handle 90% of the cases, then they have 'niche tools' that are used infrequently. The problem is, without the 'niche tools', those cases take 10x as long to do using the basics .. so they are actually a big boost to productivity. So by having a package 10x as big, you can work 2x as fast. (however that works out). The cost of a bigger package is amortised across hundreds of thousands of users all being double as efficient, and of course it's worth the artist taking time to learn all the features... you might have falsely concluded that those niches 'aren't important'.

Of course this is why they end up with the plug-in frameworks, handling an endless variety of special cases. But with the web approach of today, we could have different tools connecting to the same underlying data? (something those 3d packages sometimes do is allow UI layout customisation to suit specific workflows.. choosing which buttons/palettes are visible and so on)

Your existing 'rotated bounding box' idea could combine well with their hierarchies, e.g. to annotate a face , use the same rotated axes for the two eyes , nose , mouth.. by starting with the 'face' rotation and keeping it for the components (I was going to make another suggestion actually, option to rotate the image or coordinate axes as a quick way of doing that..)

I also think about an alternate approach of marking boundaries first (e.g. 'skyline', 'kerb edge', and then you could fill regions between them (tarmac & sidewalk either side of the kerb edge, sky above the skyline, etc)

I am thinking about Pascal VOC or even some machine learning framework specific formats (e.q TFRecord)

sounds good, I'm not familiar with these but i'm guessing TF is TensorFlow..

dobkeratops commented 6 years ago

not yet, but i'll set something up tomorrow (it's already pretty late here in Austria ;-)). Can you recommend me some good channels?

(i'm in the UK but am sometimes awake weird times)

all on freenode:-

ai, ##AGI- broad 'chat' about AI, tends to be less technical but it's a great place to talk about high level ideas, even though it sometimes gets less technical people just talking about 'superintelligence' hype and so on. I recommend this for the breadth of topics and best place for 'blue-sky' discussion.

machinelearning - practical advice..detailed talk about ML frameworks and maths, much more focussed than #ai

programming - best/busiest general programmers channel

OpenGL - tends to be the busiest graphics related place

wikimedia-tech - might be the place to talk about interfacing to wikimedia

dobkeratops commented 6 years ago

regarding concern about "scrapers":-

I think you might be ok, because the focus of this academic tool is different to comercial products (like google discouraging youtube scrapers). They do seem to encourage people installing their code on their own servers.

The site already gives you mass downloads :-

http://labelme2.csail.mit.edu/Release3.0/browserTools/php/dataset.php http://labelme2.csail.mit.edu/Release3.0/browserTools/php/publications.php

I hope you can just credit them where it's due (they seem to talk about what to do for 'citations'), share data back to them, and all will be ok?

So they list 281 'collections', the public interface would require 2813 clicks to manually download eveything*. the best 'collections' seem to have 100's of images. some seem to have 1000's of images which are just individual frames of video (might be easy to spot by comparing sequentially..). I think you could grab a lot of value in about 10 strategically chosen downloads (e.g. ones like this http://labelme2.csail.mit.edu/Release3.0/browserTools/php/browse_collections.php?public=true&start=150&username=arandomlabeller&folder=/05june05_static_street_boston http://labelme2.csail.mit.edu/Release3.0/browserTools/php/browse_collections.php?public=true&username=arandomlabeller&folder=/barcelona_static_street etc)

bbernhard commented 6 years ago

Many thanks for the great freenode Channel names! It took me a while until I figured out how to connect, but now it works. :)

sounds good, I'm not familiar with these but i'm guessing TF is TensorFlow..

Jep, right.

I hope you can just credit them where it's due (they seem to talk about what to do for 'citations'), share data back to them, and all will be ok?

Thanks for the links, that looks really promising. :)

Absolutely. this sounds in the same vein as the 'diagonal cut corner' shapes, you could present those in a palette. I definitely thought 'circle'/ellipse labels would suit some use cases better (e.g. 'shoulder', 'knee','eye')

Awesome, then I'll create a ticket for that :)

I had this discussion with 3d artists over the years about "what features does a 3d editor need". The basic tools handle 90% of the cases, then they have 'niche tools' that are used infrequently. The problem is, without the 'niche tools', those cases take 10x as long to do using the basics .. so they are actually a big boost to productivity. So by having a package 10x as big, you can work 2x as fast. (however that works out). The cost of a bigger package is amortised across hundreds of thousands of users all being double as efficient, and of course it's worth the artist taking time to learn all the features... you might have falsely concluded that those niches 'aren't important'.

Haven't looked at it that way, but as I am now thinking about it, it makes perfectly sense. At the beginning of this project I was only focusing on making everything as simple as possible (just one shape, only simple tasks..) but you convinced me that there is also need for some more advanced tooling. :)

Of course this is why they end up with the plug-in frameworks, handling an endless variety of special cases. But with the web approach of today, we could have different tools connecting to the same underlying data? (something those 3d packages sometimes do is allow UI layout customisation to suit specific workflows.. choosing which buttons/palettes are visible and so on)

Yeah right, that's exactly what I would have in mind. I think there is a need for easy tooling as well as for more complex annotation/labeling tools.

An idea that came to my mind recently: Would it be possible to create an annotation/labeling tool that doesn't require using your mouse? I am a big fan of vi(m) and the fact that I am way more productive than with any other IDE/text editor out there. So I was thinking whether it's possible to use the "keyboard only" concept also for annotating/labeling? I was thinking about a small marker that you can move around with your keyboard to mark some points which then forms a polygon. But would a "keyboard only" mode be faster or more expressive than mouse + keyboard? Or is the mouse + keyboard (hotkeys) approach already the best?

dobkeratops commented 6 years ago

Would it be possible to create an annotation/labeling tool that doesn't require using your mouse?

I think so, and this would be great for laptops. And yes I agree: alternating between mouse and keyboard is a big drain IMO. The other two extremes (hotkeys+mouse, or 100% keyboard) feel faster.

I think it would be ok if you could alternate between batches of mouse/keyboard work, i.e. use the mouse (+hotkeys) to mark areas, then put your hands on the keyboard, and write the labels. ((i) allow marking regions as unlabelled, perhaps just 'object' .. (ii) just have a hotkey to select 'previous poly, next poly', so you can do selection of existing regions on the keyboard?). In the 'mostly mouse' workflow you could still have hotkeys to select between a palette of a few labels.

Yet another idea is that those could go through another pass, 'verify/refine labels' where you ask multiple choice questions "Q:what kind of 'object' is this: (not an object) (animal) (building) (plant) .." "Q:what kind of animal is this: (not an animal) (vertebrate) (invertebrate) ..." etc

I was thinking about a small marker that you can move around with your keyboard to mark some points which then forms a polygon

another idea for really 100% keyboard: I saw some tools that just use a grid (smartphone.. 'tap the squares that contain..'). So maybe you could use the cursor keys to move between grid cells? In later passes, the tool could zoom in & subdivide the grid cells where the labels change, for refinement.

bbernhard commented 6 years ago

I think it would be ok if you could alternate between batches of mouse/keyboard work, i.e. use the mouse (+hotkeys) to mark areas, then put your hands on the keyboard

That's also a good idea :)

another idea for really 100% keyboard: I saw some tools that just use a grid (smartphone.. 'tap the squares that contain..'). So maybe you could use the cursor keys to move between grid cells?

That's a really cool idea, thanks for the suggestion! Just out of interest: Have you seen that in an annotation tool or in an image editing software?

As you already mentioned that could be really interesting for smartphone users. It probably needs some tweeking to find the appropriate grid size, but I could image that with such a grid you could then also use "swipe/draw" gesture to mark all the cells while you are moving your finger over the screen. With the possibility to zoom in & subdivide and maybe some simple UI elements (to hide already annotated objects) it would maybe be possible to do some simple work even on the smartphone.

May I ask you a favor? Could you maybe record your screen when you are annotating some (more complex) scenes? It would be really interesting to see how you are working, which tools you are using and how you are structuring the whole process.

If you are busy with other stuff and have no time for it, it's also no problem. :)

dobkeratops commented 6 years ago

"Could you maybe record your screen when you are annotating some (more complex) scenes? "

it's an interesting idea but I think you wont learn anything that I can't report verbally already.. and you'll have your own common-sense insights when you test your own software. (maybe you can collect analytics.. but it's probably overkill). The best way to develop a tool IMO is "dogfooding".. a developer who actually uses it; you'll be able to make a judgement between (i) how complex a feature is to implement and (ii) how much benefit it will bring.

what I can report is that I'm using a laptop .. I tend to use bounding boxes more , which lets one produce a higher volume of labels over time. The polygon tool would probably be much easier on a desktop PC with mouse. It's just a relaxed setup for 'killing time' in a mildly constructive way .. not being sat at a desk.

I've not touched 'label-me's mask tool, so there's just the bounding-box and poly tool.

One thing you wont get from raw measurements is motivation.. I think you'll gain more insight from discussion.

You might gain useful insight from looking into photoshop selection tools (probably more thoroughly researched), which is a similar 'use-case' .. although it has a much bigger 'precision requirement'

dobkeratops commented 6 years ago

http://labelme2.csail.mit.edu/Release3.0/browserTools/php/browse_collections.php?public=true&username=arandomlabeller&folder=/valladolid_static_street

here's the collection of street scenes where I've done most labels -mostly bounding boxes (i've done others scattered elsewhere too) .. you can see where I get bored and just mark a region 'cars' instead of wanting to click each one etc.

one compromise I make is trying to mark a few cars with polygon outlines, then mark the components (wheels etc) with just bounding boxes.

elsewhere in their data, I think paid labellers had more patience and drew outlines more often