observation/brainstorming - label incidence.. "long tail"?

I suspect what we might have with labels is something like a 'long tail' or '80/20' rule.. a few labels that have huge numbers of examples, then a large number with few examples each,

e.g. most of the traffic is indeed Car, but then you have Van,Truck,Bus,Motorbike. Most of the urban ground is pavement/road, but then you have bits of gravel, grass, driveway, parking space, concrete, soil..

I'm getting this impression with recent productive-label additions - e.g. after getting the browse mode it was possible to do a big surge of pavement/road, then as these examples because exhausted, we get another surge of possible annotations with each new label.

I guess it might be interesting to see the total number of annotations by label somewhere (i'm not sure I can find this in the stats screen, although it's comprehensive in other ways)..

What I always imagined is there would indeed be sparse examples of labels on the 'edge' of the graph, but this would almost be like "red links in wikipedia" - the frontier of a gradually expanding space of examples (the real potential of this tool is with continual extension, rather than the set challenge of imagenet, CIFAR etc)

Also I hope that the graph means even if individual labels have sparse examples, they will still contribute to earlier 'abstract labels' - like the "passenger vehicle" idea (whats in common between airliners, cruise ships & busses), etc.. a lot of machines share common funtional components like exhausts, hydraulics, control panels etc.

What i've started to do with building annotations is make a point of splitting them (so they could go through refinement later .. building->house, building->shop,building->skyscraper..) although this is difficult with the 'accidental close' problem (it will even close when you click on vertices of other polys, so you can't place vertices of 2 adjacent buildings nearby)

I guess it might be interesting to see the total number of annotations by label somewhere (i'm not sure I can find this in the stats screen, although it's comprehensive in other ways)..

you are right, that's missing at the moment. At the moment I am not totally happy anymore with the statistics screen - I think the current representation doesn't work very well with a lot of labels.

Maybe we can keep the current statistics page as a general overview, but also give the users the possibility to fetch detailed statistics for individual labels. I am thinking about a search textfield here, where you can enter a label, and it gives you the statistics for that label:

number of annotations
number of validations
error rate
..

with the 'accidental close' problem (it will even close when you click on vertices of other polys, so you can't place vertices of 2 adjacent buildings nearby)

good point; that bug is already on my todo list - I'll look into it, as soon as the annotation rework feature is done :)

ImageMonkey / imagemonkey-core

observation/brainstorming - label incidence.. "long tail"? #162