ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

20k image goal #120

Closed bbernhard closed 5 years ago

bbernhard commented 6 years ago

As we have now reached the first 10k milestone (thanks to @dobkeratops for the many many contributions to the image dataset) I would like to brainstorm a bit how we can further increase the number of images in the dataset. There are still quite a significant amount of images in my "to be unlocked" queue, but I am unlocking images almost every day - so the queue is getting smaller and smaller.

I guess for the 20k milestone it probably won't matter which type of images we are focusing on, but nevertheless I think it could be beneficial if we find at least one set of (completely different) type of images that we could add. Is there anything out there that could be interesting to collect and annotate? pictures of games, artworks, paintings, NASA images...?

The main thing why I am bringing this up is, that we are still in the early stages with a relatively small amount of images (compared to other datasets), so we are still pretty flexible and can make changes to the software design/architecture, in case we need to. The more images we are collecting, the harder it probably gets to make those changes.

If we add another (completely different) type of resource in the early stages, its probably a good test in how flexible our design is. It probably helps us also to answer some important questions:

Maybe we come to the conclusion that photos are all we need and there is no need to add support for other type of resources - which would also be fine. But I thought it might be a good idea to bring that topic up early.

edit: I guess its better if we concentrate on public domain resources - with all the EU regulations in place and the upcoming GDPR I want to play it safe for now.

dobkeratops commented 6 years ago

beneficial if we find at least one set of (completely different) type of images that we could add

wholeheartedly agree and this is why I've paused on submissions; I think you'd benefit at the very least urban scenes from other countries and cities, and better still, different classes of image altogether (domestic, industrial, agriculture..)

How can we separate those resources nicely?

not sure but 'fish-eye'/lens type is another obvious demand for separation

with all the EU regulations in place and the upcoming GDPR

agree, and this sounds sensible for the 'spirit of the law' beyond any specific regulations

I guess its better if we concentrate on public domain resources

pointing at wikimedia commons might be a way to go, but you were right that those images can change. I can't remember the details but there might have been a way to refer to them by a specific ID. Or (by virtue of being CC) you could copy them?

Does the label graph concept work with two completely different type of resources? I guess that could be a great way to put the label graph concept to the test.

I would guess it will: given the graph will allow multiple ways of categoriting the same label, or vica versa getting as specific as you need . but you're right, you'll know for sure once it's put to the test..

bbernhard commented 6 years ago

wholeheartedly agree and this is why I've paused on submissions;

hehe..I already guessed that ;)

I think you'd benefit at the very least urban scenes from other countries and cities, and better still, different classes of image altogether (domestic, industrial, agriculture..)

totally agreed. I guess as long as it's still a "normal" photo (urban scene, still image...) I think we could get away with the label (scene) concept. But I am not really sure if that holds true for other type of images (paintings, comics, 3d images ...) that clearly stand out. I could imagine that those type of images might also require a different set of annotation tools? Even the fish-eyed images are a bit too "normal" in my opinion to really stand out...if you wouldn't have mentioned it, I probably wouldn't have noticed it. (I have to admit I am not a photographer...;))

But at the moment I am struggling a bit to find a domain where it makes sense to collect data. I mean we could collect images from games that are public domain/open source and label them with infos like "release date", "publisher" and annotate the main characters. But what's the purpose of that? Will anyone use the data? Is there a use case for that?

pointing at wikimedia commons might be a way to go, but you were right that those images can change. I can't remember the details but there might have been a way to refer to them by a specific ID. Or (by virtue of being CC) you could copy them?

I think it shouldn't be that hard to write a wikimedia commons crawler that grabs public domain images with specific tags and imports them into the dataset. The only thing I noticed is, that there is a significant amount of images that are only creative commons licensed.

Theoretically we could also use them, if we attribute the author of the image, but this would require us to add a license information to every image. We probably then also need to change the export API, so that one can search for images with specific licenses. Users then also need to be aware that some images are more restrictive than others...so they might not be allowed to use them for everything they want to. So I think that would be pretty messy and complicated to handle :/

bbernhard commented 6 years ago

I just found this blog post from the dropbox creators, which is actually pretty interesting to read: https://blogs.dropbox.com/tech/2017/04/creating-a-modern-ocr-pipeline-using-computer-vision-and-deep-learning/

Especially the part where they are talking about their own annotation tool (called DropTurk UI) is pretty cool. Thinking about it...a document level/image word dataset is also something that would be really cool to have. I still have a few boxes in the attic with a lot of handwritten documents from my schooldays, which would be great material to annotate. I guess a lot of people also have handwritten recipes from their mother/grandmother which would also be great material...

If there are some nice annotation tools in place, I think it could actually make fun ;)

dobkeratops commented 6 years ago

I just took some kitchen photos, mostly food.

Then it struck me , for arranged indoor photos, it might be possible to automatically generate masks.. e.g. put a camera on a mount; take the background; add an object .. take object+background. The difference would give the object mask. This could still be submitted for annotation of parts.

another idea is to weigh the objects first.. could you train a net to gain intuition about mass, or for 'a plate of stuff' it would help quantify ("plate of 50g of lettuce.." ... "plate of 200g of lettuce" etc)

I could try to use my r-pi's to improvise a cheap 3d scanner (just 8 cameras, not 256+ high-speed like the pro 'lightstage') - the icing on the cake would be automating different lighting conditions, i.e. place an object in the middle and instantly get 64 training samples (=8 angles x 8 dlightsources). I can imagine that even without labels, such data would be useful for neural nets to absorb, e.g. gain intuition for 3d shapes generally.. (i've heard of them successfully being able to guess rotated views of objects).

bbernhard commented 6 years ago

I just took some kitchen photos, mostly food.

awesome!

Then it struck me , for arranged indoor photos, it might be possible to automatically generate masks.. e.g. put a camera on a mount; take the background; add an object .. take object+background. The difference would give the object mask. This could still be submitted for annotation of parts.

Hahaha, that's a nice one! :D I am wondering if we could maybe even use existing code for that. Because the "smart annotation" mode also works with masks (behind the scenes it's basically using the grabcut algorithm). So maybe it would be possible to add an option to upload those masks (via API call?) and feed it to the system to get back the contour/poly points.

another idea is to weigh the objects first.. could you train a net to gain intuition about mass, or for 'a plate of stuff' it would help quantify ("plate of 50g of lettuce.." ... "plate of 200g of lettuce" etc)

I REALLY like that idea. Would be interesting to see, if one could use that data to train a calories tracking app. :D

I could try to use my r-pi's to improvise a cheap 3d scanner (just 8 cameras, not 256+ high-speed like the pro 'lightstage')

awesome idea!

  • the icing on the cake would be automating different lighting conditions, i.e. place an object in the middle and instantly get 64 training samples (=8 angles x 8 dlightsources).

The only problem I could see that at the moment is the duplicate detection mechanism. As those images are probably pretty similar, I could imagine that the image hashing algorithm would detect those images at duplicates and would prevent you from uploading tje,.

But I think it makes even more sense to handle those images differently in the first place. Maybe we can tag them somehow, so that we know that they all belong to the same series of photos?

dobkeratops commented 6 years ago

But I think it makes even more sense to handle those images differently in the first place. Maybe we can tag them somehow, so that we know that they all belong to the same series of photos?

indeed that would be useful for anyone using the data (i.e. do they want to train something to use angled images together)

yet another option would be to submit those pasted together into a grid (if we're going to filter out fish-eye lens, merged images could be another category at that level)

Either way perhaps information can be added through "un-annotatable labels" to signify whats going on

So maybe it would be possible to add an option to upload those masks (via API call?) and feed it to the system to get back the contour/poly points.

it might be possible to submit the background, and perhaps again make an un-annotateable label that both the empty background and added objects have in common

Would be interesting to see, if one could use that data to train a calories tracking app. :D

right. I've heard of similar apps already, but haven't heard of an open dataset that does it. i've also wondered if machine vision could make a better supermarket checkout

dobkeratops commented 6 years ago

https://en.wikipedia.org/wiki/List_of_films_in_the_public_domain_in_the_United_States#Date_of_publication ... would any of these be a source of images? downsides: they tend to be old films, so they might be low quality (most are even monochrome) and dated by artefacts.

bbernhard commented 6 years ago

Very cool, thanks for sharing!

I think videos could be a really great source for images, even if they are of low quality. It would be interesting if it's possible to extract "steady" frames from a video in a scripted manner.

A while ago, I experimented a bit with ffmpeg and the possibility to train a neural net that can detect NSFW content. I looked for slideshow videos on youtube and some p*rn sites, fed those videos into ffmepg and periodically extracted a frame (as the source was a slideshow, there was less motion and the chance for capturing a steady frame was higher). All frames where then fed into an image classifier to train a neural net. This worked actually pretty well (the sourcecode is here: https://github.com/bbernhard/imagemonkey-playground/blob/master/scripts/train_nsfw.py)

I am wondering if it's possible to extract (more or less) steady frames from any video? If that would be possible we could actively start looking for public domain videos that are suitable. I think it would make sense to tag the images that were extracted from videos/movies appropriately (e.q: movie, vintage movie, movie title...etc), so that we can query and filter them (that also helps us in case we, in contrary to expectations, need to remove something again due to copyright infringement)

In my opinion, video clips and movies could be a great image source. We probably only need to be careful in selecting the "right" footage. I guess that there are some pictures out there, that let's people react emotional (e.q: a picture of Donald Trump speaking at a press conference; images of WWII,...). I am a bit worried, that people react emotionally and tag them with labels like a**hole..etc.

So I would suggest that we either skip those "hot topics" or (in my opionion) even better, restrict the contribution to those type of images. i.e: everybody can access the data (to train a neural net), but only people with a certain credibility score can label/annotate them. I think images from US gov. press conferences and state visits of gov. representatives could be great data material.

bbernhard commented 5 years ago

done :)