Open Luobeia opened 2 years ago
@Luobeia 卡在哪一步,报啥错误?
$ git clone https://github.com/baabaaox/ScrapyDouban.git
# 构建并运行容器
$ cd ./ScrapyDouban/docker
$ sudo docker-compose up --build -d
# 进入 douban_scrapyd 容器
$ sudo docker exec -it douban_scrapyd bash
# 进入 scrapy 目录
$ cd /srv/ScrapyDouban/scrapy
$ scrapy list
sudo docker-compose up --build -d这一步就跟演示视频不同了,刚开始我去下docker和docker-compose解决了这两个命令不能识别的错误,我在centos下运行的,然后我这里报下面的错误,第一次做scrapy相关的项目,小白,忘见谅 Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 710, in urlopen chunked=chunked, File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 398, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/lib64/python3.6/http/client.py", line 1254, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib64/python3.6/http/client.py", line 1300, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib64/python3.6/http/client.py", line 1249, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib64/python3.6/http/client.py", line 1036, in _send_output self.send(msg) File "/usr/lib64/python3.6/http/client.py", line 974, in send self.connect() File "/usr/local/lib/python3.6/site-packages/docker/transport/unixconn.py", line 30, in connect sock.connect(self.unix_socket) FileNotFoundError: [Errno 2] No such file or directory
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 450, in send timeout=timeout File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 786, in urlopen method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2] File "/usr/local/lib/python3.6/site-packages/urllib3/util/retry.py", line 550, in increment raise six.reraise(type(error), error, _stacktrace) File "/usr/local/lib/python3.6/site-packages/urllib3/packages/six.py", line 769, in reraise raise value.with_traceback(tb) File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 710, in urlopen chunked=chunked, File "/usr/local/lib/python3.6/site-packages/urllib3/connectionpool.py", line 398, in _make_request conn.request(method, url, **httplib_request_kw) File "/usr/lib64/python3.6/http/client.py", line 1254, in request self._send_request(method, url, body, headers, encode_chunked) File "/usr/lib64/python3.6/http/client.py", line 1300, in _send_request self.endheaders(body, encode_chunked=encode_chunked) File "/usr/lib64/python3.6/http/client.py", line 1249, in endheaders self._send_output(message_body, encode_chunked=encode_chunked) File "/usr/lib64/python3.6/http/client.py", line 1036, in _send_output self.send(msg) File "/usr/lib64/python3.6/http/client.py", line 974, in send self.connect() File "/usr/local/lib/python3.6/site-packages/docker/transport/unixconn.py", line 30, in connect sock.connect(self.unix_socket) urllib3.exceptions.ProtocolError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 214, in _retrieve_server_version return self.version(api_version=False)["ApiVersion"] File "/usr/local/lib/python3.6/site-packages/docker/api/daemon.py", line 181, in version return self._result(self._get(url), json=True) File "/usr/local/lib/python3.6/site-packages/docker/utils/decorators.py", line 46, in inner return f(self, *args, kwargs) File "/usr/local/lib/python3.6/site-packages/docker/api/client.py", line 237, in _get return self.get(url, self._set_request_timeout(kwargs)) File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 542, in get return self.request('GET', url, kwargs) File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 529, in request resp = self.send(prep, send_kwargs) File "/usr/local/lib/python3.6/site-packages/requests/sessions.py", line 645, in send r = adapter.send(request, **kwargs) File "/usr/local/lib/python3.6/site-packages/requests/adapters.py", line 501, in send raise ConnectionError(err, request=request) requests.exceptions.ConnectionError: ('Connection aborted.', FileNotFoundError(2, 'No such file or directory'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/bin/docker-compose", line 8, in
@Luobeia 查了下你这个报错 https://github.com/docker/compose/issues/7896 好像是你 centos 的 docker 服务没启动起来 1.检查一下 docker 服务状态
sudo systemctl status docker
2.如果没运行则启动 docker 服务
sudo systemctl start docker
3.再执行上面的
$ cd ./ScrapyDouban/docker
$ sudo docker-compose up --build -d
# 进入 douban_scrapyd 容器
$ sudo docker exec -it douban_scrapyd bash
# 进入 scrapy 目录
$ cd /srv/ScrapyDouban/scrapy
$ scrapy list
好的多谢,我去试试,有问题再来请教你
不好意思,又遇到问题了,ScrapyDouban/docker/Dockerfile里面有apt-get命令,我的这个centos用的是yum命令,我把它全改成yum命令运行不了,不改的话识别不了apt-get命令
上面那个问题解决了,打扰了,不过还是爬不了数据,估计是因为代理的问题?我后面再来研究研究代理,谢谢!!
@Luobeia 如果大量403的话,就需要用代理IP来解决
怎么看403呢,我看代码有error, ERROR: Gave up retrying <GET https://m.douban.com/movie/subject/1292052/> (failed 3 times): DNS lookup failed: no results for hostname lookup: m.douban.com. twisted.internet.error.DNSLookupError: DNS lookup failed: no results for hostname lookup: m.douban.com.
@Luobeia DNS 解析失败了,是不是你虚拟机网络有啥问题 ,自己 ping 看看
ping m.douban.com
好的,我去看看我网络是不是Ping不同,感谢!
大佬,想问一下那个是不是得先爬取电影的id才能爬取电影的数据啊,就是有多少个id就爬多少部电影
我下午还爬了1000多组数据,但是后面好像ip被封了,一条都爬不了了,还有那个数据在我centos系统里根本找不到是为啥
@Luobeia
我用phpmyadmin的,在网页上输入,192.168.122.1:8080/phmyadmin,访问不了是为啥
看了一眼演示视频,用的是adminer,我自己去试试,没看到
请问一下,我想爬豆瓣里的预告片信息,用xpath定位,用浏览器插件检查也获得了网址,但是我修改原来的movie_meta.py文件,让official_site字段爬我想爬的信息,为啥不行
你好,请问这个应该怎么运行,我在win10和vm的centos7上按照使用方法来操作,配置了两天环境还是不能运行,请问除了requirement.txt里的软件需要安装外,还需要安装什么吗,万分感谢