ImageMonkey / imagemonkey-core

ImageMonkey is an attempt to create a free, public open source image dataset.
https://imagemonkey.io
47 stars 10 forks source link

detailed labels (pt 2): wikipedia links? #11

Open dobkeratops opened 6 years ago

dobkeratops commented 6 years ago

imagine if the 'detailed labels' were associated with URLs, e.g. links to articles in wikipedia, e.g: "knife" -> "butter knife : knife", "carving knife : knife" etc..

if you could format or associate URLs from this,

https://en.wikipedia.org/wiki/Kitchen_knife#Carving https://en.wikipedia.org/wiki/Kitchen_knife#Butter

hardcoding wikipedia as a target would solve the problem of policing for spam-links.

etc

you could use the annotated images in a browsing mode as a means of exploring wikipedia (and perhaps inspire others to make crossover tools ... find all the relevant image annotations from a wikipedia page).

Of course there's wikipedia commons. it would be awesome if those people would integrate an image labelling tool, they have annotations but they're not useable to the extend that the machine-learning community needs. maybe something can be done with the APIs (e.g. reference the wikimedia commons images in this tool, directly, perhaps try to read their metadata)..

wikipedia is the most obvious target: associating images with a large hyperlinked text database.

see also gamification ideas.. 'what would motivate people to label' ?

see also #21, suggestion to rely on wikimedia commons as a source of curated safe images.

bbernhard commented 6 years ago

Cool idea!

Integrating other services is a pretty interesting topic. I recently thought about adding pictures of other public stock photo sites such as Pexels or Unsplash. But in order to do all the data aggregation (and to prevent that we end up with dead links) it's probably not enough to just store a link to the image....so we would need to scrape those sites and download the images. But I am not sure if suc sites are comfortable with scraping images...

dobkeratops commented 6 years ago

"But I am not sure if suc sites are comfortable with scraping images..."

... best assume not, because their value is in their data and they probably go to lengths to protect it , e.g: youtube itself has small print that says video downloaders go against their 'terms of service'

wikimedia commons however explicitely says the images are free, I would hope they could be encouraged to integrate such a service