linkextractor Search Results

353 results
for linkextractor

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

algolia/docsearch #1831

Anchors are being stripped out (using `sitemaps`, `linkExtra…

## Description We are using Algolia Crawler UI for parsing our mixed static HTML & SPA website (using hash router). All URLs are provided in `sitemaps` Crawler config. ```js new Crawler({ st…

bojanrajh updated 2 months ago
4
scrapinghub/web-poet #10

Discussion for supporting Scrapy's LinkExtractor

One neat feature inside Scrapy is it's [LinkExtractors](https://github.com/scrapy/scrapy/blob/64905e3397a5b837312169a0b418857ef1cf40c7/scrapy/linkextractors/lxmlhtml.py) functionality. We usually try …

BurnzZ updated 2 years ago
10
scrapy/scrapy #6321

Option to include all tags and attrs in LinkExtractor with s…

## Summary Add the option to the LinkExtractor class to consider all tags and attributes (e.g. if you pass `None` then consider all tags/attributes), and `deny_tags` and `deny_attrs` arguments …

User087 updated 4 weeks ago
11
scrapy/scrapy #578

Linkextractors and ItemLoader Unified API

There are lots of linkextractors with different flavors, but we don't need linkextractors we just need the filters (or processors) and a good way to handle them. What is the different between using e…

nramirezuy updated 3 years ago
8
play-with-docker/play-with-docker.github.io #213

training.play-with-docker.com/microservice-orchestration/ no…

Lab https://training.play-with-docker.com/microservice-orchestration/ `python linkextractor.py` does not work because container has python3 only

ldelaprade updated 4 months ago
3
mastaal/nllegalcit #13

Compare performance with LiDO

LiDO has an API which can be used to systematically find the references their LinkeXtractor has found in a specific document, see https://linkeddata.overheid.nl/front/portal/services. This can be used…

mastaal updated 5 months ago
2
scrapy/scrapy #6329

LinkExtractor changing case of URL (but didn't used to)

Regression? I have a HTML file that contains a link like: `Words` I'm extracting with code that looks like this: ``` link_extractor = LinkExtractor( restrict_xpaths=xpath) tmp_links =…

mohmad-null updated 4 months ago
3
scrapy/scrapy #3755

LinkExtractor does not extract relative links

Is this the intended behaviour of `LinkExtractor`? I seem to not be able to extract relative URLs when using it. Alternatively, if I use a selector for `a` elements, I can capture everything. For r…

zach-watrhub updated 2 years ago
11
scrapy/scrapy #3613

Apply priority in Rule

Hello, I think it is useful to add priority in Rule, so developers can use CrawlSpider with priority property and the property automatically pass to Spider object. The expected Rule would be som…

phongtnit updated 5 years ago
4
scrapy/scrapy #4979

LinkExtractor does not extract href="javascript:xxx" links

### Description I needed to automatically generated urls from `href="javascript:xxx"` links, and tried to using `LinkExtractor` and `process_value()` as mentioned in [scrapy docs](https://docs.scra…

gmargari updated 3 years ago
1

上一页 1...1 2 3 4 5 6 7...36 下一页

353 results for linkextractor

353 results
for linkextractor