linkextractor Search Results

353 results
for linkextractor

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

atiger77/ScrapyProject #1

jandan.. 报错啊..

File "/Users/v/Desktop/ScrapyProject/JanDan/JanDan/spiders/jiandan_ooxx.py", line 18 rules = ( ^ IndentationError: unexpected indent rules = ( Rule(LinkExtractor(allow=('h…

Homeless-Xu updated 7 years ago
2
scrapy/scrapy #1202

LxmlLinkExtractor doesn't extract links with fragment

LxmlLinkExtractor calls `canonicalize_url` with the url only, removing any fragment present in the URL. As AJAX URLs rely on fragments, it would be nice if we could initialize the link extractor with …

Djayb6 updated 7 years ago
5
clemfromspace/scrapy-selenium #109

Does the `wait_time` argument need a `wait_until` to work co…

In the following parser I want the spider to SeleniumRequest all links on a page according to the rules I have specified in the Srapy LinkExtractor 'le'. It seems to me that no matter what wait_time I…

tungland updated 2 years ago
1
zfcampus/zf-hal #174

Hal Link not supporting templated links

I noticed that templated links are still not supported in ZF-Hal. We are already using templated links for a long time inside our customized version of this library, what is the reason that templated …

Wilt updated 4 years ago
1
scrapy/scrapy #1381

link extractor joining base href to 'tel:' directive

The end result I'm getting on the process_links hook is something like: http://www.domain.com/somepage.htmltel:123456 http://www.domain.com/blog/posttel:123456 When there's an our phone: 123456 Tag …

itamargero updated 4 years ago
13
scrapy/scrapy #2303

RedirectMiddleware does not respect spider's crawling rules

As described in #15 (and #1042), some links to offsite domains may be crawled via redirects. For example: ``` python # in spider: allowed_domains = ['xxx.com'] ``` ``` bash # in log, offsite domain …

djunzu updated 9 months ago
11
scrapy/scrapy #1306

Speedup & fix URL parsing

I profiled a simple Scrapy spider which just downloads pages and follows links extracted using LinkExtractor; it turns out one of the main bottlenecks is urlparse module and our related functions like…

kmike updated 7 months ago
39
xiaoxiaosuaxuan/newscrapy #22

每期版面和单个新闻url形式相同

https://xjrb.ts.cn/xjrb/20220912/2.html

Apandada updated 1 year ago
1
xiaoxiaosuaxuan/newscrapy #20

链接不变，response.url 显示链接不变

在做杨子晚报时，如截图所示，找到了其隐藏的网址信息，但爬取失败，当我在terminal做调试时，response.url显示并没有爬取到隐藏网址，请问怎么修改 ![image](https://user-images.githubusercontent.com/119149508/221501250-60220fa2-283e-45c2-a730-7f9e16076956.png) ![ima…

victorzhuang21 updated 1 year ago
1
rohanbk/Mountain-Project-Scraper #10

URLs and selectors are outdated

```py domain = 'https://www.mountainproject.com' # URL should be preceded by a / # e.g. /destinations or /v/STATENAME/ID relativeURL = '/v/hawaii/106316122' start_urls = […

endolith updated 1 year ago
5

上一页 1...1 2 3 4 5 6 7...36 下一页

353 results for linkextractor

353 results
for linkextractor