-
要把这一行注释掉,要不然停不下来 :)
`#Rule(LinkExtractor(restrict_xpaths='//div[@id="pageStyle"]//a[contains(., "下一页")]'))`
本来要弄个 Pull requests
发现没有分支
-
I added this to my `settings.py` but it doesn't work
```python
SPIDER_SETTINGS = [
{
'endpoint': 'dmoz',
'location': 'spiders.dmoz',
'spider': 'DmozSpider',
…
-
### Description
I'm rather new to scrapy, created some 100 spiders and now moved the data to a MySql database.
To check if the page is scraped before I decided to use a process_value function, in …
-
When there is a crawlspider with multiple rules, if one of them has no methods attached as a callback, then it gets "`TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneTyp…
-
# Comportamento Esperado
Uma opção que pode ser interessante na configuração dos coletores: restringir a busca do coletor por links no código-fonte da página, utilizando expressões regulares ou os s…
-
Add the ability to select specific divs and tags to be added to the spider's linkextractor restrict css tool like annoation UI. This feature add a great usability to users who doesn't know regex to ad…
-
你好。你的这个项目非常有意义,我也有相同的需求。
我是一个Python初学者。对于代码部分有些疑问。
我安装了Python3.5的版本,我暂时还没弄懂如何启动MongoDB,所以先先把结果保存为csv文件:
scrapy crawl google -o test.csv JOBDIR=app/jobs
但是我得到如下错误信息:
ImportError: No module named…
-
@clemfromspace I just decided to use your package in my Scrapy project but it is just yielding normal scrapy.Requuest instead of SeleniumRequest
```
from shutil import which
from scrapy_seleniu…
-
### Brand name
LEGOLAND Discovery Centre
### Wikidata ID
Q303439
### Store finder url(s)
https://www.legolanddiscoverycentre.com
-
File "/Users/v/Desktop/ScrapyProject/JanDan/JanDan/spiders/jiandan_ooxx.py", line 18
rules = (
^
IndentationError: unexpected indent
rules = (
Rule(LinkExtractor(allow=('h…