ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
46 stars 10 forks source link

motivation: database stats #28

Open dobkeratops opened 7 years ago

dobkeratops commented 7 years ago

rich statistics could help guide people to do labelling that increases the breadth of the database, e.g. if you show which words have fewest labels, you know where it needs improvement

bbernhard commented 7 years ago

Great idea!

There is currently the "Explore" tab, but I am not 100% happy with it's representation. Not sure if the "bubbles" are a good representation method in the first place or if plain text or graphs are a better form of reprenstation?

Do you have any suggestions what information we could expose here? Just the count? Any suggestions regarding the representation method are also welcome. At the moment I am not really sure what's the best way to illustrate that.

dobkeratops commented 7 years ago

i suppose you might be able to collect 'error rate', to highlight what is tricky; and of course user contributions so you can see how far you've got.

'LabelMe' does ok here, it shows the current labels in an image collection as a bar chart, and the total contributions per user

bbernhard commented 7 years ago

Many thanks!

Will also have a look at 'LabelMe', to get some inspiration :)

I'll put that ticket on the "Project Planning TODO List"

dobkeratops commented 7 years ago

Another idea for motivation - much more complex - would be to try to find synergy with a 'manually-guided Photogrammetry application'. i.e. using multiple photos of a region to build a 3d model, summarised with a top-down map imagine actually displaying the map and seeing it grow as you do more labelling (specific labelling, i.e you mark out 'a building' , 'a tree' in 2 images, but then you go through and say which ones correspond). Applications like '123D catch' do this by asking for a large enough number of photos to compute automatically , but it relies on making a unique model everywhere.

Games , by contrast, could use repeatable textures , repeated 'instanced' trees etc that just happen to be scaled to about the right size and so on; that's where the labelling would come in (the labelling would need to be more precise).

This would work for static objects (i.e. trees, specific buildings), but would need some estimate of depth for moving objects (which you might only have one view at one moment in time of). But if you'd identified the ground, all you have to do is specify the point where the object touches the ground, which by default would be the centre of the lower edge of it's bounding box (of course messier when occluded.. but if you actually display this, the user will know)

Accumulating snapshots of where objects were at some moment in time (hence can potentially be) would be an amazing data-set to have , alongside a 3d model.. i.e. positions of traffic and pedestrians in street scenes, or where all the items might appear on tables inside a home. You could show 'heat maps' (in the map-view) of where objects were.

I realise I've just described a far more complex application.

I wonder if there are any short-cuts to this, or the ability to break this up into a separate service that links to the pure labelling app (keep one browser window open doing the labelling, another shows you what it manages to compute from your labels..)

bbernhard commented 7 years ago

Awesome idea!

But as you already said, much more complex. Nevertheless, I think it makes sense to create a new ticket for that idea, so that we don't lose it and close this one as soon as we are happy with the new statistics tab :)

btw: I just pushed some changes to production, which can be seen here: https://imagemonkey.io/explore I also added some "per country" statistics, as I thought it might be motivating for people to "push" their country. What do you think?