linkextractor Search Results

algolia/docsearch #1831

Anchors are being stripped out (using `sitemaps`, `linkExtra…

## Description We are using Algolia Crawler UI for parsing our mixed static HTML & SPA website (using hash router). All URLs are provided in `sitemaps` Crawler config. ```js new Crawler({ st…

bojanrajh updated 1 month ago

scrapinghub/web-poet #10

Discussion for supporting Scrapy's LinkExtractor

One neat feature inside Scrapy is it's [LinkExtractors](https://github.com/scrapy/scrapy/blob/64905e3397a5b837312169a0b418857ef1cf40c7/scrapy/linkextractors/lxmlhtml.py) functionality. We usually try …

BurnzZ updated 2 years ago

scrapy/scrapy #6321

Option to include all tags and attrs in LinkExtractor with s…

## Summary Add the option to the LinkExtractor class to consider all tags and attributes (e.g. if you pass `None` then consider all tags/attributes), and `deny_tags` and `deny_attrs` arguments …

User087 updated 1 week ago

play-with-docker/play-with-docker.github.io #213

training.play-with-docker.com/microservice-orchestration/ no…

Lab https://training.play-with-docker.com/microservice-orchestration/ `python linkextractor.py` does not work because container has python3 only

ldelaprade updated 3 months ago

alltheplaces/alltheplaces #5309

petsathome_gb spider unstable

The most recent run of the petsathome_gb spider from 2023-05-15 has returned 50 fewer stores than the previous run from 2023-04-15. I've checked a few of the missing stores, and they all appear to sti…

rjw62 updated 3 weeks ago

scrapy/scrapy #578

Linkextractors and ItemLoader Unified API

There are lots of linkextractors with different flavors, but we don't need linkextractors we just need the filters (or processors) and a good way to handle them. What is the different between using e…

nramirezuy updated 3 years ago

scrapy/scrapy #6329

LinkExtractor changing case of URL (but didn't used to)

Regression? I have a HTML file that contains a link like: `Words` I'm extracting with code that looks like this: ``` link_extractor = LinkExtractor( restrict_xpaths=xpath) tmp_links =…

mohmad-null updated 3 months ago

mastaal/nllegalcit #13

Compare performance with LiDO

LiDO has an API which can be used to systematically find the references their LinkeXtractor has found in a specific document, see https://linkeddata.overheid.nl/front/portal/services. This can be used…

mastaal updated 4 months ago

scrapy/scrapy #3755

LinkExtractor does not extract relative links

Is this the intended behaviour of `LinkExtractor`? I seem to not be able to extract relative URLs when using it. Alternatively, if I use a selector for `a` elements, I can capture everything. For r…

zach-watrhub updated 2 years ago

scrapy/scrapy #3613

Apply priority in Rule

Hello, I think it is useful to add priority in Rule, so developers can use CrawlSpider with priority property and the property automatically pass to Spider object. The expected Rule would be som…

phongtnit updated 5 years ago

352 results for linkextractor

352 results
for linkextractor