ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

New Search options - un-annotated, <N=1 labels #288

Open dobkeratops opened 3 years ago

dobkeratops commented 3 years ago

The “un labelled” search has been very useful but these images are drying up , ie those remaining are often plain landscapes, parks, beaches etc with few (or no) obvious objects for annotation.

2 useful ideas for new search options:

Between these “unannotated” would probably be the most useful

bbernhard commented 3 years ago

I think this should be do-able in a couple of weekends, I'll have a look!

dobkeratops commented 3 years ago

One more idea .. a true pure random browse option would be useful, if it doesn’t already exist; (I know there’s the random label search button already, that can be interesting when you’re stuck for ideas but it tends to show very few possibilities per press because most label suggestions have few instances). This would also be the best default to fill the label search with

I would speculate most images have 1 label, or a few labels but no annotation (ballpark 170k labelled objects and 100k images, probably most with 1, 25%5-10, 25% unlabelled?)Browsing pure random you’d still mostly see open work - because it’s easier to submit photos than it is to fully annotate.

seeing a mix of open work and ready annotated images for reference might give people hunts how to work (also imagine if you could verify from unified mode..)

if you have limited time.. I’m not sure which of these 3 ideas is the most useful so I’d suggest adding whichever is easiest

bbernhard commented 3 years ago

I've just updated the version on the server. It's now possible to query the dataset with the image.num_labels keyword.

e.g:

The query language also allows to use the > and < operators. So, it's possible to do something like this: image.num_labels < 10. But I just realized that there's probably a database query that isn't optimized for this yet, as this results in a browser freeze. I'll need to look into that. But maybe it's already useful that way :)

dobkeratops commented 3 years ago

Nice general solution.. I’ll try it out EDIT works great.

Numlabels =1 gets lots of fresh varied images with plenty left to do (eg lots of those either need the main object or scene label added). As it stands this is now the best search mode

One unexpected finding is it seems to count pendung tasks raster than the actual number of labels. It’s not a problem because num=1 gets 90% unannotated images (but a few “mostly annotated with one remaining task.. usually the one that’s too difficult to do)

I also note a lot of these single label uploads are low res , if you do get any more time a work area tweak might help that (eg could they be scaled to fill as much of the window as possible.. it’s like it defaults to showing them at 1:1, so if you’ve got 800 pixels of available space but a 320 x200 image, the work area is always just 320 pixels wide.. imagine if it always stretched it to the available 800. It’ll be be blurry but you’ll be able to draw more precisely) E87788A0-B1B1-41B8-835D-F3478485BDC4

the other search suggestions would still be useful to complement this , but right now this is a nice enough boost ,delivering 90%. (My biggest next request would be the work area tweak )

dobkeratops commented 3 years ago

A3F6BEAF-3500-4F24-B2FD-D3409006A4D2 Ok after a bit more use, this is definitely the best search now, but it’s more like a 60/40 mix of “open work” and “saturated images” (the last few open tasks are usually too difficult to do; we can enhance the database much faster more by focusing on the first most obvious object . We can always train on this)

So I would still prioritise the request for an “un-annotated” search - this will be the best way to find the most productive images (for annotation). (The best images to work on would currently be some blend of unannotated and “just one label, not 1 open task)

(The above image also explains why “pure random” would be the best default - regardless of the shifting state it would give the most variety and high chance of finding currently productive images)

bbernhard commented 3 years ago

One unexpected finding is it seems to count pendung tasks raster than the actual number of labels.

Oh, that's definitely a bug, many thanks for the info! I'll have a look!

Nice graphics! Your style of drawing reminds me a bit of the famous xkcd comics :grinning:

You are totally right with the "un-annotated" search, that's why I already started working on that one today. :) I'll let you know as soon as a first version is online to play with.

dobkeratops commented 3 years ago

I notice you can upload images with multiple labels now, as well as it putting them into a donations collection. Thanks for this tweak! This is really handy , eg being able to upload as “field,farm,tractor”, “street,car” .. combining a scene label and main object label. (The reason for searching for num labels =1 is there’s existing images where the single label isn’t the best)

some of the scene/theme labels don’t seem available eg I can use “street”, but not “military”,”railway station” “countryside” “living room” ( even though I see these in the main label list in the stars screen)... does it try to filter those out ?(it would definitely be useful to have all those available at upload, they’re great for summarising and grouping related images)

bbernhard commented 3 years ago

does it try to filter those out ?(it would definitely be useful to have all those available at upload, they’re great for summarising and grouping related images)

Oh, that smells like a bug. Thanks for the info!

I am still busy with the "query for unannotated" feature, but if everything works out, I think I should be able to push that to production end of this week/beginning of next week. I'll have a look at it then :)

bbernhard commented 3 years ago

I've just added a new query option: image.num_open_annotation_tasks > 10 allows to query for images that have at least 10 open annotation tasks. I hope that's useful to you :)

It still returns all labels per image, but we could highlight those labels that already have a annotation if that helps.

dobkeratops commented 3 years ago

Thanks , could be interesting, sure. so basically the issue is that most of the labels aren’t worth bothering with .. (eg images with perspective mean you’ve got lots of fiddly details in the distance. And that’s why the automatic tasks mode is so painful to use).

When “adding labels” you want to confirm (comprehensively) what is in the image - because we can train on the entire label list without annotation, but you don’t necasserily want to actually do all of them.

this is why “un-annotated” would be useful: these images always have their high priority work remaining .. the labels that are prominent, good examples.. the main focuses of the image.

the reason for “num labels=1” is that one label doesn’t usually descrube the whole image well (so the “train on whole image-> label list” idea breaks). Adding the scene labels or most prominent object if it started as a scene makes that more viable ... I think for “whole image training” I will want to look for images with “num labels>1” to be safer.

bbernhard commented 3 years ago

Oh, damn. I actually had that before, but during development I thought that I could make the new query option even more powerful and reworked it a bunch of times. But by reworking it, I accidentally removed the one use case you were after.

In my head, image.num_open_annotation_tasks = 0 was equal to "un-annotated", but that's actually complete bullshit.

Sorry, about that!

So it would actually be something like image.num_annotations = 0 or image.is_unannotated = true right? Is there one syntax you would prefer?

Do you think other query option that I've added is somehow useful? If not, I would remove it again with the next update. (it's not a big problem to leave it in there, but every database query option that will be added makes the search potentially a bit slower. So if there's no need for that, I guess it's better to remove it again)

dobkeratops commented 3 years ago

I’d sat choose whichever is easiest to implement. “is_annotated=true/false” would suffice for the important use cases, but if you’ve already got the numerical query logic, “num_annotations =0 , >0 ... etc” might have interesting use cases (“let’s see a selection of images with multiple annotations..”

dobkeratops commented 3 years ago

Thinking ahead .. “image.num_annotations=0” gets the best tasks now, then as these eventually dry up “num_annotations=1” would find the next best and so on.

I was also thinking it would be interesting when browsing if there was the option for showing all existing annotation outlines (you can see some browsing for “rework” and in “explore”). This would show you what is left to do. (A “thoroughly annotated” image would be obscured in lines.. a fresh image would be empty, ripe for annotating.. and in the middle ground you’d see where images still have a few obvious objects left). this would be great in conjunction with a “pure random” search - it would give you an overview of the whole state of the database

dobkeratops commented 3 years ago

Idea for tweaked browse mode. Perhaps it sounds too heavy.. but random search might be less logic. Perhaps the “show all annotations” feature is available from the rework mode. Something like this could show newcomers a broad overview of the state of the dataset, whilst also being a good way to find “tasks” (..just browse and look for gaps). Even without the zoom controls, if it could recalculate the number of images to show based on window width, we could use browser zooming to do this.. FA545885-FE1A-461A-8CA2-ACADC8ABABE2

bbernhard commented 3 years ago

I think I've tried something like this a while ago. If I remember correctly the problem back then was that the rendering of all those annotations resulted in a browser freeze. For each image a new canvas is created and fabric.js paints all the annotations on the canvas. If there are images with a lot of annotations this will be pretty slow (especially if this are complex annotations with a lot of poly points). I haven't looked into that problem in detail, but maybe there are some things I could do to speed up the rendering.

Another possibility would be to paint the annotations directly on the image and serve static images instead. So, whenever someone adds a annotation, we would kick off a background job that takes the original image and paints all the annotations on top of that. When "show all annotations" is selected those pre-painted images are served. The problem with that is, that this will all be done asynchronously - so the data might not always be up to date.

What I would find even nicer is a way to somehow calculate the annotation coverage. I've tried that a while back and there's even a PoC available, but I never finished it so that it's actually usable. The idea was to to create a convex hull over all the annotations and then calculate the percentage of pixels that it covers. (e.g 70% of the image is covered by annotations). But if there are big objects in an image the metric becomes pretty useless. In that case you would get that 90% of the image is already covered by annotations but actually it's just a big bounding rect that's covering 90% of the image.

dobkeratops commented 3 years ago

Ok seems it’s not trivial, I’d recommend the refined search options over fiddling around with rendering optimisations there .. I guess it works ok when it’s just showing one outline , hence the current views.

I wonder if there’s enough api/querying to support an external tool doing more advanced browsing ... at some point I would like to grab the data and render these kind of overviews myself (the sky is the limit for ways of sorting and showing it)

I was also going to suggest a url parameter for kicking off a search , one could keep some shortcuts around and use it when sharing links to the site (someone in a forum asks for “a dataset of cars” ... paste an imagemonkey url with one click that shows a page of car annotations .. eg “imagemonkey.io/annotate?mode=browse&view=unified&search=car&rework=true”)