ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

observation/brainstorming - label incidence.. "long tail"? #162

Open dobkeratops opened 6 years ago

dobkeratops commented 6 years ago

I suspect what we might have with labels is something like a 'long tail' or '80/20' rule.. a few labels that have huge numbers of examples, then a large number with few examples each,

e.g. most of the traffic is indeed Car, but then you have Van,Truck,Bus,Motorbike. Most of the urban ground is pavement/road, but then you have bits of gravel, grass, driveway, parking space, concrete, soil..

I'm getting this impression with recent productive-label additions - e.g. after getting the browse mode it was possible to do a big surge of pavement/road, then as these examples because exhausted, we get another surge of possible annotations with each new label.

I guess it might be interesting to see the total number of annotations by label somewhere (i'm not sure I can find this in the stats screen, although it's comprehensive in other ways)..

What I always imagined is there would indeed be sparse examples of labels on the 'edge' of the graph, but this would almost be like "red links in wikipedia" - the frontier of a gradually expanding space of examples (the real potential of this tool is with continual extension, rather than the set challenge of imagenet, CIFAR etc)

Also I hope that the graph means even if individual labels have sparse examples, they will still contribute to earlier 'abstract labels' - like the "passenger vehicle" idea (whats in common between airliners, cruise ships & busses), etc.. a lot of machines share common funtional components like exhausts, hydraulics, control panels etc.

What i've started to do with building annotations is make a point of splitting them (so they could go through refinement later .. building->house, building->shop,building->skyscraper..) although this is difficult with the 'accidental close' problem (it will even close when you click on vertices of other polys, so you can't place vertices of 2 adjacent buildings nearby)

bbernhard commented 6 years ago

I guess it might be interesting to see the total number of annotations by label somewhere (i'm not sure I can find this in the stats screen, although it's comprehensive in other ways)..

you are right, that's missing at the moment. At the moment I am not totally happy anymore with the statistics screen - I think the current representation doesn't work very well with a lot of labels.

Maybe we can keep the current statistics page as a general overview, but also give the users the possibility to fetch detailed statistics for individual labels. I am thinking about a search textfield here, where you can enter a label, and it gives you the statistics for that label:

with the 'accidental close' problem (it will even close when you click on vertices of other polys, so you can't place vertices of 2 adjacent buildings nearby)

good point; that bug is already on my todo list - I'll look into it, as soon as the annotation rework feature is done :)