-
https://mp.weixin.qq.com/s/W1yP-1QkFHNIQFD0H_2-cQ
-
hi, 大佬您好,看到这个框架,经过仔细研究了下发现真的是很好用,其中也研读理解了部分源码,受益良多啊,是很个棒的框架。但是在使用中还是出现了这样的问题:
如果在一个 loop里同时运行两个以上的 Spider 实例(这里是使用的 Spider.async_start())这个方法运行,例如这样:
```python
class SpiderA(Spider):
...
…
-
RegexField.extract()函数在接受etree._Element对象时会将其转换为字符串,当前的转换方法无法正常转换中文,会将中文转化为乱码。
下面这段代码似乎可以正常工作
```python3
if isinstance(html, etree._Element):
html = etree.tostring(html, encoding='utf-8', pr…
-
![3VM`WO(0~K8~U9U 2L8_5}I](https://user-images.githubusercontent.com/46965669/74097141-ad439180-4b43-11ea-9690-6e1200ce6c83.png)
这是复制例子的。
![TLM{JIS}BAF{RM18@_}UX3H](https://user-images.githubuse…
-
https://www.douban.com/note/767330495
-
I assumed the `DELAY` attr would set the delay for retries but instead it applies to *all* requests. I would appreciate it if there was a `DELAY` attr specifically for retries (`RETRY_DELAY`). I'd be …
-
Hello,
To avoid crawling duplicated URLs, I store all crawled URLs in database, and then check if one URL requested is already in DB or not.
Here is my filter class:
``` py
class CustomFilter(RFPDu…
-
`sudo apt-get install docker.io`
...
` sudo docker run -it --name=proxy proxy`
```
Traceback (most recent call last):
File "ipproxytool.py", line 7, in
import run_validator
File "/h…
-
2017-02-14 11:29:18 [10], msg:sql helper execute command:CREATE TABLE IF NOT EXI
STS free_ipproxy (`id` INT(8) NOT NULL AUTO_INCREMENT,`ip` CHAR(25) NOT NULL UNI
QUE,`port` INT(4) NOT NULL,`country`…
-
Hi, binux . When i use Handler.crawl_config to set the headers for every request sometimes the headers would not set for the task. But when i run the project manually step by step it works well. By th…
xrlin updated
7 years ago