RipMeApp / ripme

Downloads albums in bulk
MIT License
3.73k stars 629 forks source link

Ripme Hangs When Processing Select Reddit Accounts #845

Closed TjohAGq6VQWLt7gKMo closed 6 years ago

TjohAGq6VQWLt7gKMo commented 6 years ago

v1.7.60

Ubuntu 16.0.5 x86_64

https://www.reddit.com/user/mrsmeeseeks

Expected Behavior

Ripme to rip the Reddit user.

Actual Behavior

Ripme begins ripping the Reddit account, however after processing ~143 URL's no further output is printed to the console and Ripme never progresses.

The below snippet is the last of the lines printed to the console.

[!] Non-retriable status code 403 while downloading from https://scontent-iad3-1.xx.fbcdn.net/hphotos-xtp1/v/t1.0-9/12227616_965447620210688_8012212240488106371_n.jpg?oh=3ca96c406a5a96ea929c73b66f74a337&oe=57235655
[!] Non-retriable status code 403 while downloading from https://scontent-iad3-1.xx.fbcdn.net/hphotos-xpl1/v/t1.0-0/p480x480/12742484_996603870419113_3901029855628818223_n.jpg?oh=9532f6eb3a14ac284f8dcca3875f9bed&oe=5753F8C0
[!] Non-retriable status code 403 while downloading from https://scontent-iad3-1.xx.fbcdn.net/hphotos-xft1/v/t1.0-9/12745678_965803786841738_2357677817198035332_n.jpg?oh=1503d00347a3e3ae16fdc2a6003442f5&oe=575251B5
[!] Non-retriable status code 403 while downloading from https://scontent-iad3-1.xx.fbcdn.net/hphotos-xpl1/v/t1.0-9/12744592_965414420214008_7546147867789908906_n.jpg?oh=be5fc4a49d7cb1678dbe257514c31fbd&oe=57258C07
[!] Unable to rip URL: http://www.bbc.com/news/uk-11492867
[!] Unable to rip URL: https://www.youtube.com/watch?v=3eDRhcUZtWw&t=14m48s
[!] Unable to rip URL: http://www.breitbart.com/big-government/2015/10/21/democrats-support-black-lives-matter-presidential-town-hall/
[!] Unable to rip URL: https://www.reddit.com/r/democrats/comments/3ps7vx/the_blacklivesmatter_network_urges_the_democratic/).
[!] Unable to rip URL: https://www.facebook.com/BlackLivesMatter/posts/504279263076657
[!] Unable to rip URL: https://www.reddit.com/r/democrats/comments/3ps7vx/the_blacklivesmatter_network_urges_the_democratic/).
[!] Unable to rip URL: https://www.facebook.com/BlackLivesMatter/posts/504279263076657
[!] Unable to rip URL: http://bit.ly/blmdebate
[!] Unable to rip URL: https://www.facebook.com/BlackLivesMatter/photos/a.180522288785691.1073741827.180212755483311/503942139777036/?type=3&theater
[!] Unable to rip URL: http://reuters.com/article/newsOne/idUSKCN0SF2AK20151021
[!] Unable to rip URL: http://abcnews.go.com/US/donald-trump-jeb-bush-embarrassed-happening/story?id=34620387
[!] Unable to rip URL: https://www.reddit.com/user/mrsmeeseeks/m/trump
[!] Unable to rip URL: http://www.cnn.com/2015/10/18/politics/donald-trump-jeb-bush-9-11/index.html
[!] Unable to rip URL: http://www.foxnews.com/us/2015/04/24/union-cancels-boeing-vote-says-gun-toting-workers-told-it-to-take-off/
[!] Unable to rip URL: http://www.wafb.com/story/28801897/lafayette-business-ceo-wanted-on-racketeering-charges-turns-himself-in
[!] Unable to rip URL: http://www.reddit.com/r/worldnews/comments/31jqb3/museum_exhibits_evidence_of_japanese_vivisection/cq2vp9l?context=4
[!] Unable to rip URL: https://www.youtube.com/watch?v=4AdzjsZJebs
[!] Unable to rip URL: http://www.reuters.com/article/idUSBRE98210L20130903?irpc=932
[!] Non-retriable status code 403 while downloading from https://pbs.twimg.com/media/CRL-RM-UkAAq2e6.png
[!] Unable to rip URL: http://www.local12.com/news/features/top-stories/stories/flu-vaccine-not-working-well-only-23-percent-effective-23636.shtml
[!] Unable to rip URL: http://www.reddit.com/r/news/comments/2j0lgc/texas_healthcare_worker_tests_positive_for_ebola/cl7ajcz).
[!] Unable to rip URL: http://www.itv.com/news/update/2013-09-11/london-arms-fair-ejects-two-firms-over-torture-weapons/
[!] Unable to rip URL: http://np.reddit.com/r/conspiracy/comments/26lqyw/just_a_reminder_regarding_one_of_the_best_posts/chs8qnw
[!] Unable to rip URL: http://www.reddit.com/r/Seattle/comments/26at66/two_posts_announcing_gn_grld_town_hall_appearance/chpbecy
[!] Unable to rip URL: http://www.reddit.com/r/conspiracy/comments/269r4c/two_posts_announcing_glenn_greenwalds_town_hall/),
[!] Unable to rip URL: http://www.reddit.com/user/charlesgrodinfan/m/seattle
[!] Unable to rip URL: http://capitolcommentary.com/2014/02/22/revolution-in-ukraine-shows-why-firearm-ownership-is-important/
[!] Unable to rip URL: http://www.reddit.com/r/ronpaul/comments/jnyvn/why_ron_paul_has_already_won/
[!] Unable to rip URL: http://www.reddit.com/r/occupywallstreet/comments/1a112x/why_was_hugo_chavez_so_popular_a_look_at_the/
cyian-1756 commented 6 years ago

This is a network issue. To avoid this in the future set download.timeout in rip.properties to a smaller value (Around 5000 is alright)

TjohAGq6VQWLt7gKMo commented 6 years ago

Hello,

Here is my rip.properties config file:

# Download threads to use per ripper
threads.size = 5

# Overwrite existing files
file.overwrite = false

# Number of retries on failed downloads
download.retries = 5

# File download timeout (in milliseconds)
download.timeout = 10000

# Page download timeout (in milliseconds)
page.timeout = 5000

# Maximum size of downloaded files in bytes (required)
download.max_size = 104857600

# Don't retry on 404 errors
error.skip404 = true

# API creds
twitter.auth = VW9Ybjdjb1pkd2J0U3kwTUh2VXVnOm9GTzVQVzNqM29LQU1xVGhnS3pFZzhKbGVqbXU0c2lHQ3JrUFNNZm8=
tumblr.auth = JFNLu3CbINQjRdUvZibXW9VpSEVYYtiPJ86o8YmvgLZIoKyuNX
gw.api = gonewild

twitter.max_requests = 10

clipboard.autorip = false

download.save_order = false
album_titles.save = false
remember.url_history = false
window.position = false
descriptions.save = false
prefer.mp4 = true
auto.update = false
log.level = Log level: Error
play.sound = false
download.show_popup = false
log.save = false
urls_only.save = false

When processing the Reddit account provided Ripme hangs indefinitely. Ripme doesn't appear to be timing out after 5000 or 10000 milliseconds per my config. When I grabbed the above snippet of code Ripme was sitting at the same place for over an hour.

Maybe I am misunderstanding the issue, but shouldn't Ripme try to process the URL(s) for up to 10000 milliseconds and then proceed onto the next URL with the exception of the configured number of retries?

cyian-1756 commented 6 years ago

Ripme doesn't appear to be timing out after 5000 or 10000 milliseconds per my config. When I grabbed the above snippet of code Ripme was sitting at the same place for over an hour.

That ought not happen

Maybe I am misunderstanding the issue, but shouldn't Ripme try to process the URL(s) for up to 10000 milliseconds and then proceed onto the next URL with the exception of the configured number of retries?

Yes it should. Testing it out with ripme 1.7.60 however it seems like the timeout is being ignored

TjohAGq6VQWLt7gKMo commented 6 years ago

Thank you for taking the time to check, much appreciated.

Is it safe to assume their is an issue with Ripme or am I doing something incorrectly?

cyian-1756 commented 6 years ago

Doing a bit of reading it looks like the reason the timeout isn't working here is because ripme set setConnectTimeout() but not setReadTimeout(). This means that the timeout will only fire if ripme can't connect to the server before the timeout ends

Is it safe to assume their is an issue with Ripme or am I doing something incorrectly?

This is a bug in ripme

TjohAGq6VQWLt7gKMo commented 6 years ago

Thank you very much!

cyian-1756 commented 6 years ago

I've written a fix for this and it will be in ripme 1.7.60 which will be out in 3 days