-
I am using OSx with python 3.6
Here is my code:
```
try:
import Image
except:
from PIL import Image
import pytesseract
pytesseract.pytesseract.tesseract_cmd = '/usr/local/Cellar/…
-
## Bug 描述 (Describe the bug)
> 清晰简短的描述你遇到的 Bug. (A clear and concise description of what the bug is.)
docker-compose启动的容器scrapyd和crawler会立即退出,lianjia在一段时间后也会退出,lianjia应该是爬去完毕退出
## 如何重现 (To Reprod…
-
刚刚发现个问题,没有改过设置。 不知道为什么?🤔️
2022-01-23 01:00:54 [scrapy.core.engine] INFO: Spider opened
2022-01-23 01:00:54 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at …
-
> 2017-01-19 14:41:21 [twisted] CRITICAL: Unhandled Error
> Traceback (most recent call last):
> Failure: twisted.internet.error.ConnectionDone: Connection was closed cleanly.
This is error, I ch…
-
根据方法改了search_parser.py , 然后 python search_parser.py 出现 :
Traceback (most recent call last):
File "search_parser.py", line 1, in
import scrapy
ModuleNotFoundError: No module named 'scrapy…
-
其他步骤都正常,但是在run crawler_main的时候出现了错误,报错情况如下:
2021-04-08 01:29:39 INFO start crawling group: 106955
2021-04-08 01:29:39 INFO Getting group: 106955 successful
Traceback (most recent call las…
-
hello, I want to get the raw html code, so I write another field, named HtmlField.
```python3
import ruia.field
class HtmlField(ruia.field._LxmlElementField):
def _parse_element(self, elem…
-
https://telegra.ph/%E6%89%93%E5%B7%A5%E4%BA%BA%E9%80%9F%E9%80%9F%E9%9B%86%E7%BB%93%E4%B8%80%E8%B5%B7%E6%8A%95%E5%87%BA2020%E5%B9%B4%E5%BA%A6%E5%8D%81%E5%A4%A7%E9%BB%91%E5%BF%83%E4%BC%81%E4%B8%9A-12-22
-
原因: 在使用的过程中,爬取的过程中容易导致IP地址被封,从而无法访问
-
https://matters.news/@timianlaodon/%E8%A1%8C%E4%B8%9A%E5%89%AA%E6%8A%A505-%E7%BD%91%E4%BC%A0%E5%8C%97%E4%BA%AC%E5%A4%96%E5%8D%96-%E9%AA%91%E5%A3%AB%E8%81%94%E7%9B%9F%E7%9B%9F%E4%B8%BB-%E8%A2%AB%E6%8A%…