Open metaprime opened 7 years ago
Comment by Crapmo Wednesday Apr 12, 2017 at 09:25 GMT
https://imgur.com/r/pics/top/page/1/miss?scrolled https://imgur.com/r/pics/top/page/6/miss?scrolled https://imgur.com/r/pics/top/page/11/miss?scrolled https://imgur.com/r/pics/top/page/16/miss?scrolled ...
Repeats with same content.
You need a new way to scrape Imgur subreddit galleries.
Comment by metaprime Tuesday Apr 25, 2017 at 07:33 GMT
If the problem is on imgur's side there might not be anything we can do here.
Comment by Crapmo Wednesday May 03, 2017 at 05:51 GMT
Allow us to use imgur API keys and use the Imgur API for scraping, the api only returns 100 or so pages as far as I know though. Still alot of images to play with!
Comment by Numberphile Wednesday Jun 21, 2017 at 05:16 GMT
I had this exact same bug, here and with another ripper. I just investigated and can confirm imgur galleries repeat after the first 300 pictures. Suggest just ripping from reddit instead.
Comment by metaprime Saturday Aug 12, 2017 at 10:53 GMT
@Numberphile What do you mean ripping from reddit instead? Reddit's expando for imgur albums uses JS to get the images from imgur. (Probably using the API with an API key specific to the app.) We can do the same, as @Crapmo suggested.
Comment by Numberphile Saturday Aug 12, 2017 at 14:43 GMT
@metaprime I was able to get more images from just ripping from reddit.com/r/subreddit instead of imgur.com/r/subreddit. This was a while back so I don't remember if I was getting more of the imgur pictures or if it was getting bolstered by other hosting sites. My understanding here is purely experimental, my theoretical knowledge of how this thing actually works is pretty limited :/
Also, I seem to remember getting many more images when I first used this tool a few years ago (may have been a different ripper.)
Comment by metaprime Monday Aug 14, 2017 at 07:08 GMT
@Numberphile Okay, I see. That's a totally different thing. imgur has sections for each subreddit but a user has to intentionally post there. The way they handle rendering the images to those pages is much different from how reddit does. The reddit ripper goes through all of the links submitted to the subreddit and rips every link for which it has a ripper.
Reddit gets data from whatever links users submit, but imgur does not get it's subreddit page from imgur links submitted to reddit, it only has information about content submitted to imgur (and how it was submitted).
Since this is an issue with the ImgurRipper (and ripping subreddits works fine), let's keep this conversation focused on that issue.
Issue by Crapmo Tuesday Apr 11, 2017 at 18:11 GMT Originally opened as https://github.com/4pr0n/ripme/issues/504
After 300 images, the program hangs.
After the 300 images, I get the following output in the console:
[!] Skipping link -- already attempted: /var/www/html/folder/file.ext Retrieving https://imgur.com/r/*subreddit*/new/page/10/miss?scrolled
Problem is on Imgur's side. The /page/ results after page 5/6 are the same ones from 1-5/6, just repeated.