QianyanTech / Image-Downloader

Download images from Google, Bing, Baidu. 谷歌、百度、必应图片下载.
MIT License
2.19k stars 572 forks source link

linux下代理服务器下载google图片有问题 #8

Closed yfzmk2013 closed 6 years ago

yfzmk2013 commented 7 years ago

目前,在linux系统个下,采用代理服务器的方式下载google图片,通过命令行不能够正确运行。

sczhengyabin commented 7 years ago

@yfzmk2013 请说具体一点。。。 比如系统环境,如何运行的代码,是否修改过代码,出错的现象是什么,报了什么错。。。

yfzmk2013 commented 7 years ago

Traceback (most recent call last): File "image_downloader_google.py", line 100, in browser="phantomjs") File "/home/yanhao/project/DengHong_Git/Image-Downloader/crawler.py", line 254, in crawl_image_urls service_args=phantomjs_args, desired_capabilities=dcap) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/phantomjs/webdriver.py", line 58, in init desired_capabilities=desired_capabilities) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 92, in init self.start_session(desired_capabilities, browser_profile) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 179, in start_session response = self.execute(Command.NEW_SESSION, capabilities) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/errorhandler.py", line 163, in check_response raise exception_class(value) selenium.common.exceptions.WebDriverException: Message: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

502 - No server or forwarder data received (Privoxy@localhost)
502

This is Privoxy 3.0.24 on localhost (127.0.0.1), port 8118, enabled

No server or forwarder data received

Your request for http://127.0.0.1:42923/wd/hub/session could not be fulfilled, because the connection to 127.0.0.1 (127.0.0.1) has been closed before Privoxy received any data for this request.

This is often a temporary failure, so you might just try again.

If you get this message very often, consider disabling connection-sharing (which should be off by default). If that doesn't help, you may have to additionally disable support for connection keep-alive by setting keep-alive-timeout to 0.

More Privoxy:

Support and Service:

The Privoxy Team values your feedback. To provide you with the best support, we ask that you:

If you want to support the Privoxy Team, please have a look at the FAQ to learn how to participate or to donate.

yfzmk2013 commented 7 years ago

Ubuntu 16.04 系统

sczhengyabin commented 7 years ago

1 我刚测试了一下,可以用的。 你参考我这个调用方式试试呢? @yfzmk2013

yfzmk2013 commented 7 years ago

那你命令行可以ping通Google吗

发自网易邮箱大师 在2017年08月28日 17:34,Yabin Zheng 写道:

我刚测试了一下,可以用的。 你参考我这个调用方式试试呢?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

yfzmk2013 commented 7 years ago

命令 :curl ip.gs Current IP / 当前 IP: 45.62.105.15 ISP / 运营商: it7.net City / 城市: Los Angeles California Country / 国家: United States IP.GS is now IP.SB, please visit https://ip.sb/ for more IP information, ip.gs will only use for curl purpose. / IP.GS 已更新至 IP.SB 请访问 https://ip.sb/ 获取更多信息, ip.gs 域名仅作 curl 使用 Please join Telegram group https://t.me/sbfans if you have any issues. / 如有问题,请加入 Telegram 群 https://t.me/sbfans

但我ping www.google.com ping不通,我的网页可以上google。不知道你边是不是已经让命令行可以登上google

sczhengyabin commented 7 years ago

@yfzmk2013 我确实是在路由器上翻墙的,不过这应该没影响。如果参数给的socks5代理不对,也是不能正常运行的。 image

并且我以前开发这个程序的时候,也是和你同样的条件下测试的,不会影响。

你说下具体的python版本、使用的库的版本,以及phantomjs的版本,我看看能不能复现出来.

yfzmk2013 commented 7 years ago

你刚刚贴出来的程序虽然没挂,但url是0条啊。这个不应该的。

发自网易邮箱大师 在2017年08月28日 17:55,Yabin Zheng 写道:

@yfzmk2013 我确实是在路由器上翻墙的,不过这应该没影响。如果参数给的socks5代理不对,也是不能正常运行的。

并且我以前开发这个程序的时候,也是和你同样的条件下测试的,不会影响。

你说下具体的python版本、使用的库的版本,以及phantomjs的版本,我看看能不能复现出来.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

yfzmk2013 commented 7 years ago

yaoming那个关键字,不可能是0条,说明你那边还没翻墙,会出现不挂但也没有URL的情况

发自网易邮箱大师 在2017年08月28日 17:59,yfzmk2013 写道: 你刚刚贴出来的程序虽然没挂,但url是0条啊。这个不应该的。

发自网易邮箱大师 在2017年08月28日 17:55,Yabin Zheng 写道:

@yfzmk2013 我确实是在路由器上翻墙的,不过这应该没影响。如果参数给的socks5代理不对,也是不能正常运行的。

并且我以前开发这个程序的时候,也是和你同样的条件下测试的,不会影响。

你说下具体的python版本、使用的库的版本,以及phantomjs的版本,我看看能不能复现出来.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

yfzmk2013 commented 7 years ago

不好意思,看错了,下班了用手机看的

发自网易邮箱大师 在2017年08月28日 18:02,yfzmk2013 写道: yaoming那个关键字,不可能是0条,说明你那边还没翻墙,会出现不挂但也没有URL的情况

发自网易邮箱大师 在2017年08月28日 17:59,yfzmk2013 写道: 你刚刚贴出来的程序虽然没挂,但url是0条啊。这个不应该的。

发自网易邮箱大师 在2017年08月28日 17:55,Yabin Zheng 写道:

@yfzmk2013 我确实是在路由器上翻墙的,不过这应该没影响。如果参数给的socks5代理不对,也是不能正常运行的。

并且我以前开发这个程序的时候,也是和你同样的条件下测试的,不会影响。

你说下具体的python版本、使用的库的版本,以及phantomjs的版本,我看看能不能复现出来.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

yfzmk2013 commented 7 years ago

一般socket5代理用的是127.0.0.1:1080

发自网易邮箱大师 在2017年08月28日 18:03,yfzmk2013 写道: 不好意思,看错了,下班了用手机看的

发自网易邮箱大师 在2017年08月28日 18:02,yfzmk2013 写道: yaoming那个关键字,不可能是0条,说明你那边还没翻墙,会出现不挂但也没有URL的情况

发自网易邮箱大师 在2017年08月28日 17:59,yfzmk2013 写道: 你刚刚贴出来的程序虽然没挂,但url是0条啊。这个不应该的。

发自网易邮箱大师 在2017年08月28日 17:55,Yabin Zheng 写道:

@yfzmk2013 我确实是在路由器上翻墙的,不过这应该没影响。如果参数给的socks5代理不对,也是不能正常运行的。

并且我以前开发这个程序的时候,也是和你同样的条件下测试的,不会影响。

你说下具体的python版本、使用的库的版本,以及phantomjs的版本,我看看能不能复现出来.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

yfzmk2013 commented 7 years ago

@sczhengyabin
你好 我的调用函数 方式是 : name_st=‘里皮’ crawled_urls = crawler.crawl_image_urls(keywords=name_st, engine='Google', max_number=10000, face_only=False, safe_mode=True, proxy_type="socks5", proxy="127.0.0.1:1080", browser="phantomjs")

python 版本 Python 3.5.2 phantomjs 2.1.1

报错:

keywords: 里皮 Number: 10000 Face Only: False Safe Mode: True Query URL: https://www.google.com/search?tbm=isch&hl=en&q=%E9%87%8C%E7%9A%AE&safe=on Traceback (most recent call last): File "image_downloader_google.py", line 100, in browser="phantomjs") File "/home/yanhao/project/DengHong_Git/Image-Downloader/crawler.py", line 254, in crawl_image_urls service_args=phantomjs_args, desired_capabilities=dcap) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/phantomjs/webdriver.py", line 58, in init desired_capabilities=desired_capabilities) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 92, in init self.start_session(desired_capabilities, browser_profile) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 179, in start_session response = self.execute(Command.NEW_SESSION, capabilities) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/webdriver.py", line 236, in execute self.error_handler.check_response(response) File "/usr/local/lib/python3.5/dist-packages/selenium/webdriver/remote/errorhandler.py", line 163, in check_response raise exception_class(value) selenium.common.exceptions.WebDriverException: Message: <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd">

502 - No server or forwarder data received (Privoxy@localhost)
502

This is Privoxy 3.0.24 on localhost (127.0.0.1), port 8118, enabled

No server or forwarder data received

Your request for http://127.0.0.1:48599/wd/hub/session could not be fulfilled, because the connection to 127.0.0.1 (127.0.0.1) has been closed before Privoxy received any data for this request.

This is often a temporary failure, so you might just try again.

If you get this message very often, consider disabling connection-sharing (which should be off by default). If that doesn't help, you may have to additionally disable support for connection keep-alive by setting keep-alive-timeout to 0.

More Privoxy:

Support and Service:

The Privoxy Team values your feedback. To provide you with the best support, we ask that you:

If you want to support the Privoxy Team, please have a look at the FAQ to learn how to participate or to donate.

sczhengyabin commented 7 years ago

@yfzmk2013 Sorry,我重新见了一个virtualenv来测试,依然是没问题。搜了一下报错,有可能是代理的问题。 不知道你的SS用啥啥软件,我试过本地的sslocal开的,没问题。路由器上的,也没问题,windows虚拟机里面开的SS代理,也没问题。

aojue1109 commented 5 years ago

我在win10也遇到了相同的问题,连接vpn以后运行代码一直出现selenium.common.exceptions.WebDriverException: Message: 这个错误。弄了很久,起初以为是网页代理,因为我ping不通google。最终发现代理有很多的模式,查了一下区别,将全局代理改为PAC代理后,程序可以正常的运行。