ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

discussion .. COCO dataset #52

Open dobkeratops opened 6 years ago

dobkeratops commented 6 years ago

I wasn't aware of this before, but found it through your links to those offline annotation tools; -Seems to be setup as a formal challenge like ImageNet (but beyond that because it's pixel-level segmentation). -seems this is more recent than 'LabelMe', -they have a label hierarchy (~200 labels it seems), -They have nice presentation for browsing the existing data (better than LabelMe) e.g. they can show cutout bitmaps etc. -Seems they use 'flickr' hosted images? -they have a 'is_crowd" flag on annotations (I guess this might be the generalisation of person/people, tree/forest, cow/herd etc?)

Another option for compatibility.. i'll try and absorb their classes

http://cocodataset.org/#explore

I think the labelMe dataset is still bigger (600k annotations?) .. and I'm certain there's still things that can be done to complement whatever else is out there

bbernhard commented 6 years ago

Wasn't aware of that either...but it looks really interesting. Their way of presenting the data is pretty nice - it definitely looks more fresh than LabelMe.

-Seems they use 'flickr' hosted images?

that's also something I am thinking about at the moment: does it make sense to include differently licensed photos (i.e not CC0 licensed)?

For me personally it wouldn't be a problem to add the possibility to upload differently licensed photos (some creative commons derivates), but I wonder what's the impact for users? CC0 licensed images/data is great, as it allows you to do all kind of things without giving attribution (basically "do the f*ck you want with it"). For other creative commons derivates, attribution is required which could make it less attractive for end users?

I think there are still a lot of CC0 licensed pictures out there which we could import...so I think we don't need to go that way now. But I wonder if adding different licenses would be good idea? (and maybe even enables us to use other data sources?)

Another option for compatibility.. i'll try and absorb their classes Awesome!

btw: Do you know how they annotated the images? Do they have a dedicated tool for that?

dobkeratops commented 6 years ago

btw: Do you know how they annotated the images? Do they have a dedicated tool for that?

I dont know; I suspect it was a sponsored mass edit, i.e.producing a curated dataset for a challenge; but I do see screenshots of some app they seemed to use.

Looking into their format, I think thats' also tied into Flickr; flickr in turn needs some sort of registration to access the api (you give a registration ID that combines with image ID's).

So I have a little indecision on a primary annotation format to use (r.e. my tool) - also LabelMe's export gives local filenames. currently I was writing the annotations out with imageURLs ( I still want to do a label-me compatible import/export though)

bbernhard commented 6 years ago

but I do see screenshots of some app they seemed to use.

I just found this [1] tool, which seems to be designed for working with the Coco dataset, but unfortunately it seems that this tool is not actively developed anymore (at least I couldn't find a download link anywhere).

So I have a little indecision on a primary annotation format to use (r.e. my tool) - also LabelMe's export gives local filenames. currently I was writing the annotations out with imageURLs ( I still want to do a label-me compatible import/export though)

This just again shows that there is definitely a need for a common format. :) My hope is, that if there is a clean API, the format that's used (internally) doesn't matter that much, as long as it's flexible enough. Because it's always possible to write custom converters to convert the data into another format.

[1] http://siliconmountain.jp/en/annotation-tool/

dobkeratops commented 6 years ago

seems they have the same idea as you for asking the user to annotate a specific class , that's interesting; and they do mention that in relation to crowdsourcing (i.e. it knows who's doing what images and what classes, so even if two people see the same image they could be focussing on different classes..)

dobkeratops commented 6 years ago

(i've just tried the site again and tried the 'quiz' refinement mode with car brands/dog size attribute... was interesting to see this)