ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

Project planning/Roadmap (2019) #221

Open bbernhard opened 5 years ago

bbernhard commented 5 years ago

As already mentioned before, I would like to talk a bit about the future of ImageMonkey. The main goal of this brainstorming/discussion would be to identify pain points and areas of improvement. (those can be small improvements as well as bigger ones). I found the last year's "20k image goal" really encouraging and motivating..so hopefully we can end up with a similar encouraging goal this year ;)

Open questions

Todos: Currently, those topics are on my personal todo list for 2019:

Brainstorming Not really todos, but more like my personal brainstorming topics for 2019:

But there is one particular thought that's on my mind for quite some time: Can we use microtransactions to reward power users? I think in general people do want to support FOSS projects, but not everybody has the time to do so. e.g: A father of three children has probably less time then a student; He maybe still wants to play a bit with neural network based image detection in his free time, but has not enough time to spare to do the labeling himself. I am wondering if those people are willing to pay a few bucks to reward the community? I mean there are already a few services out there that focus on micro transactions (https://www.buymeacoffee.com/, https://www.patreon.com/, https://liberapay.com/), but not sure if such a thing works for a dataset too.

dobkeratops commented 5 years ago

Are there any "must-have-features" missing?

I was going to mention the singular/plural flag as a long standing issue. Perhaps a few aliases for some common cases ("crowd" , "group of people" = explicit multiple persons , and perhaps '1 man','1 woman' '1 car' etc as the opposite) could be a temporary workaround.

Perhaps you can display "(s)" to make a default indeterminate single/plural status clear. 'unoccluded' and 'whole' would help. e.g. "1 whole unoccluded person", "1 whole unnocluded car" etc - because they give you a complete reference image for these objects.

(flags in brackets afterward? not sure.. person (s) person (1, occluded) person (unoccluded) person (1) person (group of) person (group of, occluded) person (1 unoccluded)
?)

Beyond that I suppose the unified mode might change your options on how to solve this.

Another way to solve that could be a seperate way to mark 'instance boundaries'? I've seen some training images that show the label and object boundaries seperately (e.g. black outlines, or a colour coded individual object map, in parallel with a simple colour coded image which just shows the object type)

Can we use microtransactions to reward power users?

Interesting idea. you're probably right .. micro transactions could incentivise the general public more.

Another idea (see also pen support ideas -pixel perfect masks) might be dual purpose.. artist's reference. The search modes you have will help all that. ( the unified mode would help too: focus on picking out interesting objects)

In the past I imagined (but never started) a more general game content site that encourages barter.. do some rating/sorting work or upload submissions to earn downloads. It would adjust it's own 'exchange rate' as time goes on. But microtransactions of course 'barter' with the whole world.

How can we get more users to contribute?

I've always hoped that keeping it active helps a bit (hence I try to regularly contribute) Perhaps reaching out to AI researchers.. 'if you're busy labelling something .. try it here'.

improve statistics page (add tooltip with a short explanation of the various metrics)

I was going to suggest a "Label Browser" view to complement the stats page.

Imagine a browser view showing 1 example image of each label, with a line of text:the name , number of annotations, number of labels. Click the image to enter search mode primed with that label..and you can refine the search, or jump straight into a task..

A page like this might be a nice starting point, as it would be graphically inviting, you'd be just '2 clicks' away from performing a user-interesting task , and would instantly show the breadth of the dataset.

(no idea how your code is structured.. i wonder if the browser view could be generalised to implement this)

What are the pain points at the moment?

One minor hiccup that might be easy to fix = closing a polygon can be fiddly (e.g. its hard to click the first vertex so you sometimes accidentally add a bunch of vertices around the end). Perhaps the first vertex could be made a bit larger (50%+) and make sure the hit detection with that over-rides other vertices.

The tool is generally in a good state now and just regularly increasing the label list keeps it fresh. All those recent part additions have really helped.
The unified mode will be a new dimension to it.

Besides that.. more labels will always help; although it's much better than it used to be, it's going to take 1000's to describe everything you can see accurately. It's a classic case of "the last 10% takes 90%.." .. eg 'road, pavement' might describe most of the urban areas, but there are dozens of other words needed to completely describe the ground in a city or town (and everything a self-driving-car or delivery-bot might need to reason about).

Is there a particular direction we wanna go with ImageMonkey?

The nvidia 'labelled sketch to image' use case is my dream application. That just requires 'as many labels as possible' including adjective prefixes to refine descriptions. I imagine most other uses of AI would focus more on specific domains (like foodstuffs for calorie estimation, or road-layout/signs for cars). The other related use I hope for is finding common objects in environment scans (game art is more efficient than raw 3d scans because you'd just have a few example trees repeated with instancing , instead of attempting to record every unique branch).. again that just requires many labels to describe enough objects and surface textures.

One little suggestion I made with the image descriptions was labels in the description with nested square brackets. eg "[[[a [man]] riding [an [elephant]]] in a [jungle]]" .. that would generate labels "man","elephant","jungle", and possibly "man riding an elephant" for the pairing (see also 'person riding bike', 'person in car' etc). I imagined this could lead into annotating relations between objects, deeper scene understanding..

bbernhard commented 5 years ago

Thanks a lot for all the suggestions, very much appreciated!

I was going to mention the singular/plural flag as a long standing issue.

you are totally right, completely forgot about that in the list.

Perhaps a few aliases for some common cases ("crowd" , "group of people" = explicit multiple persons , and perhaps '1 man','1 woman' '1 car' etc as the opposite) could be a temporary workaround.

Perhaps you can display "(s)" to make a default indeterminate single/plural status clear. 'unoccluded' and 'whole' would help. e.g. "1 whole unoccluded person", "1 whole unnocluded car" etc - because they give you a complete reference image for these objects.

you are right, aliases would definitely be the easiest solution. I am not sure though if it's a good idea to store plural labels in the database, or if we should always store labels in it's singular form and just use a flag to mark the plurality. I think the latter makes it easier to group the data independently of singular/plural (not sure though if there is ever a need for that).

Besides that.. more labels will always help;

totally agreed. There are still a lot of labels in the trending labels list, that I would like to make productive. Unfortunately, there's still one big open question that sometimes prevents me from making a label productive and that's: How should we deal with labels like woman or metal fence? Personally, I would really like to see those expressed with the properties system. e.g: if you want to annotate a woman, use the base label person and then attach the gender property female to the bounding box.

But that's a lot more work (at least at the moment; I hope that the unified mode can make that a bit easier), so I can totally understand that it feels more natural to label an image with woman and then draw a bounding box around it, instead of fiddling with the properties system.

The other extreme would be to kill the properties system and always use concrete labels. But if we go that route, we have to be careful that we do not end up with super specific labels (red ceramic bowl, blue ceramic bowl..etc).

It's just my personal opinion, but I wouldn't mix it. So either we kill the properties system and use concrete labels, or we use the properties system and avoid concrete labels. If we allow both, I am a bit worried that new users will be completely lost ("should I use a concrete label or should I use a property for that?").

The nvidia 'labelled sketch to image' use case is my dream application. That just requires 'as many labels as possible' including adjective prefixes to refine descriptions. I imagine most other uses of AI would focus more on specific domains (like foodstuffs for calorie estimation, or road-layout/signs for cars). The other related use I hope for is finding common objects in environment scans (game art is more efficient than raw 3d scans because you'd just have a few example trees repeated with instancing , instead of attempting to record every unique branch).. again that just requires many labels to describe enough objects and surface textures.

That pretty much aligns with the vision I have for the future of ImageMonkey - great to hear! :)

I was going to suggest a "Label Browser" view to complement the stats page.

Imagine a browser view showing 1 example image of each label, with a line of text:the name , number of annotations, number of labels. Click the image to enter search mode primed with that label..and you can refine the search, or jump straight into a task..

A page like this might be a nice starting point, as it would be graphically inviting, you'd be just '2 clicks' away from performing a user-interesting task , and would instantly show the breadth of the dataset.

Awesome idea! I think that shouldn't be that hard to add :)

One minor hiccup that might be easy to fix = closing a polygon can be fiddly (e.g. its hard to click the first vertex so you sometimes accidentally add a bunch of vertices around the end). Perhaps the first vertex could be made a bit larger (50%+) and make sure the hit detection with that over-rides other vertices.

thanks - that sounds like a really good improvement to me!