linkextractor Search Results

353 results
for linkextractor

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

Python3WebSpider/ScrapyUniversal #1

rules.py 注释掉下一页

要把这一行注释掉，要不然停不下来 :) `#Rule(LinkExtractor(restrict_xpaths='//div[@id="pageStyle"]//a[contains(., "下一页")]'))` 本来要弄个 Pull requests 发现没有分支

toyourheart163 updated 5 years ago
2
eventuallyc0nsistent/arachne #15

CLOSESPIDER_PAGECOUNT Setting doesn't work for me

I added this to my `settings.py` but it doesn't work ```python SPIDER_SETTINGS = [ { 'endpoint': 'dmoz', 'location': 'spiders.dmoz', 'spider': 'DmozSpider', …

ghostku updated 6 years ago
1
scrapy/scrapy #5532

LinkExtractor calls process_value before applying allow and …

### Description I'm rather new to scrapy, created some 100 spiders and now moved the data to a MySql database. To check if the page is scraped before I decided to use a process_value function, in …

HarikalarKutusu updated 2 years ago
9
scrapinghub/scrapy-autounit #83

A bug with CrawlSpider with multiple rules

When there is a crawlspider with multiple rules, if one of them has no methods attached as a callback, then it gets "`TypeError: join() argument must be str, bytes, or os.PathLike object, not 'NoneTyp…

Red-Pheonix updated 2 years ago
1
MPMG-DCC-UFMG/C01 #469

Restringir a extração de links utilizando texto, xpath ou se…

# Comportamento Esperado Uma opção que pode ser interessante na configuração dos coletores: restringir a busca do coletor por links no código-fonte da página, utilizando expressões regulares ou os s…

louisaturn updated 1 year ago
1
scrapinghub/portia #697

New Feature: Add restrict css to the spider by UI

Add the ability to select specific divs and tags to be added to the spider's linkextractor restrict css tool like annoation UI. This feature add a great usability to users who doesn't know regex to ad…

ahmedezzeldin93 updated 7 years ago
1
oa414/AppCrawler #2

ImportError: No module named 'sgmllib'

你好。你的这个项目非常有意义，我也有相同的需求。我是一个Python初学者。对于代码部分有些疑问。我安装了Python3.5的版本，我暂时还没弄懂如何启动MongoDB，所以先先把结果保存为csv文件： scrapy crawl google -o test.csv JOBDIR=app/jobs 但是我得到如下错误信息： ImportError: No module named…

Bluetooth503 updated 7 years ago
1
clemfromspace/scrapy-selenium #78

scrapy-selenium is yielding normal scrapy.Request instead of…

@clemfromspace I just decided to use your package in my Scrapy project but it is just yielding normal scrapy.Requuest instead of SeleniumRequest ``` from shutil import which from scrapy_seleniu…

iamumairayub updated 3 years ago
4
alltheplaces/alltheplaces #5503

LEGOLAND Discovery Centre

### Brand name LEGOLAND Discovery Centre ### Wikidata ID Q303439 ### Store finder url(s) https://www.legolanddiscoverycentre.com

Cj-Malone updated 1 year ago
1
atiger77/ScrapyProject #1

jandan.. 报错啊..

File "/Users/v/Desktop/ScrapyProject/JanDan/JanDan/spiders/jiandan_ooxx.py", line 18 rules = ( ^ IndentationError: unexpected indent rules = ( Rule(LinkExtractor(allow=('h…

Homeless-Xu updated 7 years ago
2

上一页 1...1 2 3 4 5 6 7...36 下一页

353 results for linkextractor

353 results
for linkextractor