ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

upload, should be unlabelled by default? #4

Closed dobkeratops closed 7 years ago

dobkeratops commented 7 years ago

in the 'donate' (image upload) page , it asks for images representing a specific label,

it would be useful to allow uploading images without it telling you the label first, i.e. you upload images, then people select the labels based on whatever they see in it.

it also asks for a single label; what about images which contain multiple objects .. scene data is useful (what objects appear together, etc).

e.g. "dog" , no it's a "Dog , (sitting in a) Window" etc.

EDIT- Better idea elsewhere give broad scene labels, e.g. 'urban, domestic, industrial,nature,..' etc, and make people choose at least one of these on upload. see https://github.com/bbernhard/imagemonkey-core/issues/18

bbernhard commented 7 years ago

Hi,

thanks for openening a ticket for that - really appreciated! I am also not 100% happy with the solution as it is right now. The main intention behind that was:

But I really like the idea of adding multiple labels to it - that could improve the quality of the dataset a lot.

What is your opinion on that? Would love to hear it ;)

Thanks, Bernhard

dobkeratops commented 7 years ago

r.e. unlabelled images - if users are browsing, perhaps 'unlabelled' can appear as a specific option for filtering (e.g. 'show me +dogs , -cats', 'show me unlabelled' , etc. And, by default, you assume 'don't show me unlabelled' . If you have these options in place, keeping photos around awaiting labelling should just help the data-set grow, IMO.

conversely, in a detailed labelling mode (e.g. submit labels with refined categories etc), you could show the unlabelled images first

dobkeratops commented 7 years ago

I like your general idea of 'easy clicks' (rapid validation without fiddly UI use), the question is can detailed information still be specified if you want to do so (e.g. a heirachical or trait-like classification of labels.. 'individual breeds of dog' as well as 'dog' , 'marques of car' as well as 'car' ,etc.

I imagine being able to present crops of images rather than the whole image could help , e.g. you could submit photos which are busy street /domestic scenes, then highlight the objects with rectangles, then other users may be presented with the cropped regions and asked a more simple question ('is this a car' , 'is this a dog', etc.

bbernhard commented 7 years ago

I really like your idea of a "heirachical classification". I can totally see that one as detailed information that can be added optionally. I think that could go also well together with your gamification idea on reddit ('guess what this is' (from a snippet) - multiple choice).

But I think such a fine detailed classification only makes sense if the "base label" (e.q dog) is already specified. Otherwise we could end up with a lot of different fine-granular labels without a common base label.

dobkeratops commented 7 years ago

"But I think such a fine detailed classification only makes sense if the "base label" (e.q dog) is already specified"

that also makes sense, I guess you could expect most labels to have 'base labels' ('dog : animal' etc), and perhaps a few universal 'bases' are hard coded ('animal , mineral , vegetable' or whatever)

dobkeratops commented 7 years ago

r.e. the 'gamification idea' , a 'guess what this is' (from a small obscure detail game) could actually be used to hint good features? .. i.e. details which allow more user to correctly identify what something is are good candidates for features to use in SIFT/ 'visual words' (classic machine vision vs neural nets..)

dobkeratops commented 7 years ago

I'm tempted to offer to help, - this tool interests me & i have a lot of ideas for such a thing, but I don't know my way around Go (yet) etc.. i'm a c++ rather than web person. I'll try taking a look at the source etc.

bbernhard commented 7 years ago

Coool...help is always welcome! ;)

No worries, I am by far not a Go/web expert. To be honest, I started with Go three weeks ago...so I am definitely still in the learning (making mistakes) phase myself. The cool thing about Go is, that the learning phase is quite steep. I am usually also developing in C++ and was sceptical at first, but Go has some really cool concepts and makes fun.

dobkeratops commented 7 years ago

"There is for sure good material in there, but probably also some snapshots that can't be used in any useful way."

ok , instead of 'unlabelled', a better idea might be to force the user to pick something, and give options for scene labels (domestic, urban, industrial, nature etc), and/or place names. See #18 . These categories should be broad enough to apply to just about any picture.

These should be pretty easy for submitters to sift through, and of course could filter further suggestions for detailed labels ( "furniture : domestic", "table : furniture", "lamp post : urban", etc). of course you can still have any label anywhere (a table in a garden, etc) , but these could be used to guide initial suggestions

dobkeratops commented 7 years ago

"it's easier to validate: My biggest concern with the "labeling after uploading" approach is, that it takes too long and users aren't interested in participating (anymore)."

if you separate upload and labelling, people can contribute in 2 easy steps. one person might not know all the labels in his scene, another person looking at it may give insight.

Also each step is easier. By forcing labelling before upload, I think you front-load more work, which will actually discourage contribution. For a useful dataset: you're going to want millions of images eventually ... and all the 'unlabelled parts' can still be used later in 'gamification',

unlabelled images can also be used for negative examples, e.g. "is this a porsche 9/11", "is this gothic architecture" (completely random photos are more likely to be 'no' , versus any specific label)

'more data is always better'.

dobkeratops commented 7 years ago

closing - supersede with the suggestion for 'scene labels' as catch-all, so you always label something, it's just you have much broader options , as per #18