elvisyjlin / media-scraper

Scrapes all photos and videos in a web page / Instagram / Twitter / Tumblr / Reddit / pixiv / TikTok
MIT License
371 stars 49 forks source link

Error: Remote end closed connection without response #12

Open a-ion314 opened 5 years ago

a-ion314 commented 5 years ago

When running the following command: python3 -m mediascraper.twitter nerdcity

I get the following error:

Starting PhantomJS web driver... ./webdriver/phantomjsdriver_2.1.1_linux64/phantomjs /home/User/.local/lib/python3.6/site-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless ' Traceback (most recent call last): File "/home/User/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen chunked=chunked) File "/home/User/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 384, in _make_request six.raise_from(e, None) File "", line 2, in raise_from File "/home/User/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 380, in _make_request httplib_response = conn.getresponse() File "/usr/lib/python3.6/http/client.py", line 1331, in getresponse response.begin() File "/usr/lib/python3.6/http/client.py", line 297, in begin version, status, reason = self._read_status() File "/usr/lib/python3.6/http/client.py", line 266, in _read_status raise RemoteDisconnected("Remote end closed connection without" http.client.RemoteDisconnected: Remote end closed connection without response

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/User/Desktop/git/media-scraper/mediascraper/twitter.py", line 18, in tasks = scraper.scrape(username) File "/home/User/Desktop/git/media-scraper/mediascrapers.py", line 379, in scrape self._connect('{}/{}/media'.format(self.base_url, username)) File "/home/User/Desktop/git/media-scraper/mediascrapers.py", line 51, in _connect self._driver.get(url) File "/home/User/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 333, in get self.execute(Command.GET, {'url': url}) File "/home/User/.local/lib/python3.6/site-packages/selenium/webdriver/remote/webdriver.py", line 319, in execute response = self.command_executor.execute(driver_command, params) File "/home/User/.local/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py", line 374, in execute return self._request(command_info[0], url, body=data) File "/home/User/.local/lib/python3.6/site-packages/selenium/webdriver/remote/remote_connection.py", line 402, in _request resp = http.request(method, url, body=body, headers=headers) File "/home/User/.local/lib/python3.6/site-packages/urllib3/request.py", line 72, in request urlopen_kw) File "/home/User/.local/lib/python3.6/site-packages/urllib3/request.py", line 150, in request_encode_body return self.urlopen(method, url, extra_kw) File "/home/User/.local/lib/python3.6/site-packages/urllib3/poolmanager.py", line 324, in urlopen response = conn.urlopen(method, u.request_uri, **kw) File "/home/User/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 638, in urlopen _stacktrace=sys.exc_info()[2]) File "/home/User/.local/lib/python3.6/site-packages/urllib3/util/retry.py", line 368, in increment raise six.reraise(type(error), error, _stacktrace) File "/home/User/.local/lib/python3.6/site-packages/urllib3/packages/six.py", line 685, in reraise raise value.with_traceback(tb) File "/home/User/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 600, in urlopen chunked=chunked) File "/home/User/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 384, in _make_request six.raise_from(e, None) File "", line 2, in raise_from File "/home/User/.local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 380, in _make_request httplib_response = conn.getresponse() File "/usr/lib/python3.6/http/client.py", line 1331, in getresponse response.begin() File "/usr/lib/python3.6/http/client.py", line 297, in begin version, status, reason = self._read_status() File "/usr/lib/python3.6/http/client.py", line 266, in _read_status raise RemoteDisconnected("Remote end closed connection without" urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

elvisyjlin commented 5 years ago

Hi, I cannot reproduce your error. I ran python3 -m mediascraper.twitter TwitterUser and got the following results

Starting PhantomJS web driver...
./webdriver/phantomjsdriver_2.1.1_linux64/phantomjs
/home/elvis/.local/lib/python3.5/site-packages/selenium/webdriver/phantomjs/webdriver.py:49: UserWarning: Selenium support for PhantomJS has been deprecated, please use headless versions of Chrome or Firefox instead
  warnings.warn('Selenium support for PhantomJS has been deprecated, please use headless '
Either username or password is empty. Abort login.
Crawling...
10 media are found.
Downloading...
 50%|███████████████████████████                           | 5/10 [00:04<00:04,  1.17it/s]The file download/twitter/TwitterUser/BYRkNbhCQAAaPzj.jpg exists. Skip it.
The file download/twitter/TwitterUser/BYRjjYyCIAE15FC.jpg exists. Skip it.
The file download/twitter/TwitterUser/BYRjCCECEAAuOhr.jpg exists. Skip it.
The file download/twitter/TwitterUser/BYRik0ICIAAHb57.jpg exists. Skip it.
The file download/twitter/TwitterUser/BYFE7p9CQAAqVaA.jpg exists. Skip it.
100%|█████████████████████████████████████████████████████| 10/10 [00:04<00:00,  2.36it/s]

And I have 5 pictures under ls download/twitter/TwitterUser/: BYFE7p9CQAAqVaA.jpg, BYRjCCECEAAuOhr.jpg, BYRkNbhCQAAaPzj.jpg, BYRik0ICIAAHb57.jpg, BYRjjYyCIAE15FC.jpg.

If you could provide me more information, I can then help you.

a-ion314 commented 5 years ago

Sorry, should have included the twitter user. I obtained this specific error when running it against NerdCity's twitter. So the command i ran was: python3 -m mediascraper.twitter nerdcity