unlabelled object segmentation

ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.

https://imagemonkey.io

47 stars 10 forks source link

unlabelled object segmentation #70

Open dobkeratops opened 6 years ago

dobkeratops commented 6 years ago

how about an option to draw bounding boxes , including nested components, but without a label:

this could inform further labels, (e.g. look at the data and see what remains)
it could still be used to train automatic object segmentation (that stage doesn't need to know what objects are)
adding a depth order would be yet another peice of useful information
object components can give a bit more hint to the interior shape (e.g. the contents of the rectangle could still be inside/outside, but you've got more certainty that pixels within a component's bounding box are of of the enclosing object)

e.g. there are images which have multiple objets in the, but there aren't yet the labels.. but these could be applied retroactively.

bbernhard commented 6 years ago

interesting idea, thanks for bringing that up!

e.g. there are images which have multiple objets in the, but there aren't yet the labels.. but these could be applied retroactively.

I definitely like the idea, but I am unsure whether people would segment the image in a way that we later can apply labels to the segmented pieces. I might be totally wrong, but my gut feeling is, that we either end up with very fine granular segmented images or with pretty broadly segmented images. But that's just an assumption, so I might be totally wrong on that ;) Nevertheless, even if we can't label the bounding boxes afterwards, it would still be useful information that can be used for various tasks, as you already mentioned.

Another possibility would be to create a mode similar to labelme (maybe just for signed up users?), where you can freely label and annotate objects in images. The only thing is, that we might end up in a similar situation as labelme: the dataset gets completely cluttered, as there are a lot of (subtly) different labels in place. In that case we probably would need some "post-labeling pipeline" in place, which analyzes the label (based on wordnet?) to group similar labels together.

dobkeratops commented 6 years ago

(maybe just for signed up users?),

right maybe you could build up trust with the system, by doing a number of verifications first

as there are a lot of (subtly) different labels in place.

I think that's just the nature of reality.. is ambiguity best handled by allowing overlapping, fuzzy labels (the labelling is more like alpha channels than absolutes)

asside - I've just been messing with raspberry pi cameras; I've got an itch to take some stereo footage from bike rides. I've got 2 cameras now, got as far as getting some bike footage from one .. I just need to figure out a decent mount (or just give up and get gopros like a normal person hah) . I imagine mounting them on the ends of the handlebars to get a very wide parallax.

.. I might post another suggestion for 'stereo camera support' .. but maybe that's best done as a seperate pipeline yielding auto-labelling (stereo+video would have more hints for existing object segmentation/depth, although I know that sort of thing can be very noisy..). That would of course be object bounding boxes without actual labels.

One thing i note with LabelMe is they have stills from video sequences, but a lot of those images aren't so interesting (you'll see many images similar, and a few with interesting objects to actually label) , so I guess there's something to think about r.e. how to take and use video footage

dobkeratops commented 6 years ago

(further tweak to that suggestion, r.e. heirachical labelling - you could have common 'component' labels eg 'eye mouth ear nose tail neck hand foot paw leg arm joint wheel door handle window roof engine cabin rudder ...' , then you could label parts for 'any animal, any vehicle, any building' , without knowing exactly what animal, vehicle,building it was)

bbernhard commented 6 years ago

I've just been messing with raspberry pi cameras; I've got an itch to take some stereo footage from bike rides. I've got 2 cameras now, got as far as getting some bike footage from one .. I just need to figure out a decent mount (or just give up and get gopros like a normal person hah) . I imagine mounting them on the ends of the handlebars to get a very wide parallax.

That's awesome! Let me know, when you are taking your first photos/videos! :)

I might post another suggestion for 'stereo camera support'

Sounds good...looking forward to that!

One thing i note with LabelMe is they have stills from video sequences, but a lot of those images aren't so interesting (you'll see many images similar, and a few with interesting objects to actually label) , so I guess there's something to think about r.e. how to take and use video footage

A few months ago, I experimented a little bit with the automated extraction of frames from videos. My naive approach was built around ffmpeg (capturing a still image every x seconds). While I got pretty decent results, I think it still leaves a lot of room for improvement.

dobkeratops commented 6 years ago

gave in and picked up a cheapo action-cam , pleasantly surprised.. seems to be better than the raspberry pi's (although I still see other people posting better example footage from that so I wont ditch them just yet, and I still want stereo vision)

ride_4 ride_1 ride_3 ride_7 ride_12 ride_13

Just getting 'trip footage' has the hazard that you have a lot of frames of just 'road' until you actually pass something , so it's probably worth manually selecting stills (then I suppose it might be possible to extrapolate labels through the video around those..)

dobkeratops commented 6 years ago

Still find the 'upload' options a bit too restrictive, e.g. if I want to donate the above pictures, there's no good answer to the question posed by the UI.

The photos dont focus on a single object, but they have plenty to label (car, person, building, signposts, trees, bushes, traffic-lights, bins, pavement vs road, river, railings,...) ..

my original suggestion for dealing with this was 'scene' labels e.g. the above images would all be "urban scenes". another way would be multiple tagging at the point of upload. (an 'urban scene' could prime it with 'car + person + building + road + pavement' at 50% probability or whatever, going for further refinement) Some objects are still difficult r.e. bounding boxes, (e.g. the 'road' and 'buildings' above aren't easily bound), but you could still use the fact that the image contains that as a training label

I suppose initially I could just filter the stills that 'just have cars' and upload to the car category, but the most interesting ones are a mix of many objects, and I still think thats a slight mis-representation.

another simple suggestion for the issue - maybe a few more simple labels: Road, Furniture, carpet, door would catch the 'urban' and 'domestic' scenes (which will be the most common images any internet user will have at hand); but then look at the 3rd image down. It's not a 'pavement' or 'road', but it contains 'bird','person','lamp post', 'river' .. it still seems to me the real problem is asking to categorise a scene with the name of a single object . you could have labels for 'objects' like "park", "city", "kitchen" which would cover what I call 'scenes', but then you'll have the problem of asking 'annotate all the cities in this image...' which doesn't make sense when the image is inside the object, as such the entire image 'is' the annotation and really you need to drill down into components (e.g. 'landmass' has components 'forest, city, ..'; 'city' has components 'building, road, car, ...' etc)

bbernhard commented 6 years ago

Awesome photos! Just out of interest, which camera are you using? :)

Still find the 'upload' options a bit too restrictive, e.g. if I want to donate the above pictures I dont have a clear category. The photos dont focus on a single object, but they have plenty to label (car, person, building, signposts, trees, bushes, traffic-lights, bins, pavement vs road, river, railings,...) ..

Totally agreed, the whole labeling structure is indeed something that needs more work. I postponed any work on that part multiple times, as I am a bit lacking of any good ideas on how to avoid that the dataset gets scattered with multiple ambiguous labels.

While ambiguity is not necessarily a problem during the labeling process, it makes it a nightmare when you want to export data with a specific label. That's also my main critique point about LabelMe: The dataset is really impressive, but it's so hard to get the data you are interested in. There are quite a few ambiguous labels and sometimes also a few misspelled labels, which almost always requires you to run some post-processing on the labels.

At the moment I am a bit split between the two extremes: "add free labeling support" and "restrict the labels". Ideally I would definitely strive for the first option, but I think in order to make that work we would definitely need a post-processing step, which groups together ambiguous labels.

I am not sure, if it's a good idea, but what about "free labeling with restrictions"? So you can add any label you want to the picture, but the label is only "internally used" for now. That means, the label wouldn't pop up, when you want to export data. Neither would it be possible to annotate objects with that label. After there are a significant (50?) amount of pictures with that label, a background task could automatically create a github ticket (e.q: "label 'door' is trending"). We could then review the github tickets regularly and make good labels "permanent" ones. After a label transitions from "internally" to "permanent" it can be fully used (i.e it's available in the export funcationaliy, the annotation tool...). Does that make sense or is that a bad idea?

What I personally like about the idea:

we can easily catch spelling mistakes
we can discuss in a group whether it's worth to add a label (or if there is already another label that would fit)
we can control the label cluttering to a certain degree
we can group ambiguous labels together
we can make sure that people won't abuse the labeling mechanism (to spread spam, viagra ads, malicious links to other webpages...)

Of course, we could also adapt that to the scenes concept.

dobkeratops commented 6 years ago

I am not sure, if it's a good idea, but what about "free labeling with restrictions"?

r.e. the 'mis-spellings'/'spam' etc - I can see why you went the route that you did; I know you'd have to spend a lot of time categorising all the naming conventions people used in label me like camel casing 'carSide' etc etc)

As a middle ground,I would suggest 'general labels' (such as 'animal(non-human)','plant','vehicle','tool','furniture','person','building','machine','food'.. - I bet you could pick 10 words that would usefully seperate most images, allowing you to accumulate a very broad dataset) - and then rely on another UI to refine labels ("what kind of animal.."). If you had those on tick-boxes you could still ensure a user provides useful hints on upload. 10general x 10 narrow each = 100 labels with easy navigation.

I suppose you could just add those general labels to the flat list you already have, but I dont know if you'd worry about 2 choices being applicable. (if someone sees 'animal' first in the list , they might not scroll down far enough to see 'cat, dog..')

I remember we also talked about a completely general graph (and I did a little experiment to build one), but you could start with a tree structure or even fixed layout, like your original 'metalabel/label' idea.

Once you have such a structure, I think you could go back over it and refine your labels in a controlled way. I suppose you could have a 'unknown' or 'other' that people can select (.. which you could consider as a 'request for refinement')

Maybe as a stopgap you could think of a handful of new labels that would at least seperate these common photo types.. dash cam and domestic pictures .

One suggestion is 'road', 'pavement' for the outdoor scenes - I could select either depending on what's in the centre. then you've got a way of separating pedestrian from car or road-cycling outdoor footage. (I think segmenting 'drivable road' would be very important for self-driving car vision..)

I guess for the time-being I can upload most of what I have by selecting 'person' or 'car' based on what there's more of in the particular image (I've done about 10 stills that way, that worked out ok , I can try again with a shorter interval to do more..)

Yet another suggestion for labels is adjectives and verbs ... I saw many people making single labels for that in LabelMe ("personSitting", "personWalking" etc -> just have a 'verb' label, "driving", "walking", "sitting","eating") .. then you have the ability to refine information by permuting a small set of preset choices

bbernhard commented 6 years ago

As a middle ground,I would suggest 'general labels' (such as 'animal(non-human)','plant','vehicle','tool','furniture','person','building','machine','food'..

Agreed, that's also a possible approach and might be a good starting point for now. I think label growth however is a factor that we shouldn't neglect here. If we can make sure that our "base label set" remains low in size, I think it can definitely work that way. If we reach a certain threshold on labels, it probably gets more and more difficult to pick the right label. Naturally, you would probably just type in the thing that comes to your mind first, but as you are limited by the number of choices you might pick a label that's not necessarily correct, but "fits the best" (instead of creating a pull request to add the label).

The idea behind the "trending labels" is to get a feeling about what's going on in the dataset and do some small corrections as the dataset grows. I think it's not necessarily bad to have very specific label donations (e.q "red car"), even if they don't fit in our label graph.

e.q: If we would get a bulk-donation of 100 images, all labeled with "red car" (don't know, but maybe there is someone out there, that collects images of red cars), we would detect the trending label "red car" and rename it to "car" and add "red" as a color property to 'car'. That way we could retain all the detailed information, but re-structure it in a way, that it fits our label hierarchy.

At least that's the theory...not sure, if that works in practice and if there is even need for that.

I guess for the time-being I can upload most of what I have by selecting 'person' or 'car' based on what there's more of in the particular image (I've done about 10 stills that way, that worked out ok , I can try again with a shorter interval to do more..)

Awesome!

In case you need a script...I think I have a few code snippets lying around which could help you with the uploading. The scripts automatically shrink the images (to at max 1000px) and uses the folder name as label. So you could just move your images to the appropriate folders ('person'/'car') and the script pushes all the images in the folder. I think the script is either in Python or go, but in case you want to script it yourself in your favourite language, I can also provide you the API endpoint details.

Yet another suggestion for labels is adjectives and verbs ... I saw many people making single labels for that in LabelMe ("personSitting", "personWalking" etc -> just have a 'verb' label, "driving", "walking", "sitting","eating") .. then you have the ability to refine information by permuting a small set of preset choices

Cool idea! Would it also be possible to use the refinement UI for this? ("What position is the person in?")

dobkeratops commented 6 years ago

"In case you need a script...I think I have a few code snippets lying around which could help you with the uploading. "

ah nice, I just posted an issue (after doing a bigger upload from a finer subdivision of my sequences). seems you already thought of this. I was indeed using directories to categorise.

python should be fine.

bbernhard commented 6 years ago

ah nice, I just posted an issue (after doing a bigger upload from a finer subdivision of my sequences). seems you already thought of this. I was indeed using directories to categorise.

python should be fine.

I just looked at the Python script and it's almost complete. I think I should be able to complete it tomorrow/wednesday.

btw: don't worry, if your images don't show up yet..i have to manually unlock them. I once thought about removing the manual unlocking step, but as I got a "dick pic" once, I'll probably keep that in place for now.

Another useful(?) addition is probably to link donated images to signed up users. So that one can easily keep track of the donations ("how many labels/validations/annotations do my pictures have?")

dobkeratops commented 6 years ago

experiment r.e. offline labelling, indexed color scribbles - I reduced the image to 128colours, leaving others for pixel values representing annotations. Done in an indexed-colour paint-program (d-paint clone... such things are niche tools for nostalgic purposes). The idea would be to generate masks by filling out from those, up until the major edges in the underlying image. However the reduction to 128colours in the package I was using doesn't dither, so it loses a lot of gradients. These would be best submitted alongside their originals. One would need a way of associating a label with the palette indices ideally, because you'd quickly run out of colour codes. I suppose you could write them in an 8x8 pixel font in the corner to keep all the information in the image file, but that would mean more faffing around to read..

street_annotated

bbernhard commented 6 years ago

Here is the Python script: https://github.com/bbernhard/imagemonkey-libs/blob/master/python/snippets/donate.py

Just place the script in the same directory where you have your images (i.e 'person', 'car',..) folders and adapt the LABEL variable accordingly. It will iterate through all the images in that folder, resize each image to 1000px (in case it is larger while keeping its aspect ratio) and push it to the server with the folder's name as label. I only tested it on windows, but I think it should work on other operating systems as well.

Regarding your scribbles experiment: Looks really cool and I have to admit it's a pretty interesting way of annotating images.

The idea would be to generate masks by filling out from those, up until the major edges in the underlying image. Would you rather generate the masks on the server side (after the scribbles get pushed to server) or is that something the client side app should do? I could imagine, that a fully automated mechanism (based on edge detection?) in the backend might not always produce the desired result, so that it could be good to have some kind of visual feedback and the possibility to refine the scribbles?

dobkeratops commented 6 years ago

Here is the Python script:

awesome, thanks! should be really easy now to just grab footage of traffic or crowds, and submit many examples of 'car', 'person'

dobkeratops commented 6 years ago

Grabbed a second camera of the same type, tried recording stereoscopic footage. Let me figure out what the best way is to share the raw video too.

I need to figure out how to sync/calibrate orientation etc (i'm just manually pressing record so there's a time offset.. i suspect it will need sub-frame sampling to get more precision). Also ran into issues like them running out of battery/storage at different times etc, so manually clipping/associating the files is needed. I put the cameras on the edges of the handlebars for maximum parallax effect, rather than a natural human eye separation.