SashwatAnagolum / Easy-ICD

Code and instructions on how to run Easy-ICD to generate new image classification datasets.
MIT License
1 stars 2 forks source link

Add ability to scrape from Pinterest and / or Tumblr and / or Instagram #8

Open SashwatAnagolum opened 2 years ago

SashwatAnagolum commented 2 years ago

Currently, we can only scrape images from Flickr. If we could also scrape from Pinterest / Tumblr / Instagram, we could potentially get a lot more images per class / keyword, and download high quality images for a wider variety of class names (ex. some search terms might not have a lot of associated images on Flickr, but have a lot of associated images on Tumblr, for example).

See this link for Tumblr.

reidrm commented 1 year ago

Pintrest The site does have it's own API, here, but the Terms of Service are pretty constricting so I'm not sure how useful it would be to us. Among other things, it notes not sharing any part of the API with anyone else, which pretty much prevents us from using it without requiring users of system to go make a pintrest business acount to get the API I found this github repo where someone made a scrapper without the API, but it basically goes through google to find images with the keyword and pintrest in the url, which sounds overtaxing on a system.

Instagram This seems like a bust. There are two API's, here, and the one we would want is the Basic Display, as that is for consumers vs the Graph API for commercial. However, the Basic Display requires access approval from the account you are getting images from. This would bog down or system and render it fairly unusable. Further, it looks like scraping is against Instagrams Terms of Service, and all of the homebrew scrapers I found looked super shady.