-
It would be great if a plugin like https://github.com/scrapy-plugins/scrapy-playwright did not had to force you to drive all requests through its download handlers, and instead you could drive certain…
-
```
Provide RSS integration feature to the crawler.
RSS Integration will allow for,
1. As a trigger to start/restart website crawling/indexing based on RSS
feed updates.
2. To implement an RSS…
-
```
Provide RSS integration feature to the crawler.
RSS Integration will allow for,
1. As a trigger to start/restart website crawling/indexing based on RSS
feed updates.
2. To implement an RSS…
-
There are a lot of options in simplecrawler. Might be useful to allow passing options directly to the crawler for really custom setups. Like limiting crawl depth or number of pages to crawl, etc.
-
The pipeline doesn't work anymore:
```sh
/tmp/ratholeradio-archive (git)-[master] % datalad crawl
[INFO ] Loading pipeline definition from ./.datalad/crawl/pipelines/pipeline.py
[ERROR ] Fai…
-
> [!WARNING]
> This issue is a work in progress.
This will act as a hub to centralize this information.
## Maintainer Requests
The following requests are coming straight from the Skeleton t…
-
This is a special request for Zimit 2.0 project. Devs will handle this first to test the new scraper, and only once it's working it will be transfered to content team.
- Website URL: https://www.bb…
-
```
What steps will reproduce the problem?
1. Run the Basic Crawler with RobotServer enabled
2. Have "addeasy.netfirms.com" as the seed
What is the expected output? What do you see instead?
Expectati…
-
```
What steps will reproduce the problem?
1.
SLES 11.3 with slightly patched 3.16 kernel
Linux memcached9 3.16.3-4.1.100-default #1 SMP Thu Sep 18 06:32:16 UTC 2014
(d2bbe7f) x86_64 x86_64 x86_64 GN…
-
```
What steps will reproduce the problem?
1.
SLES 11.3 with slightly patched 3.16 kernel
Linux memcached9 3.16.3-4.1.100-default #1 SMP Thu Sep 18 06:32:16 UTC 2014
(d2bbe7f) x86_64 x86_64 x86_64 GN…