-
Scrapy has a [`DOWNLOAD_HANDLERS`](https://docs.scrapy.org/en/2.4/topics/settings.html#std-setting-DOWNLOAD_HANDLERS) setting that allows to customize the handlers for each schema, it would be good to…
-
Hello,
I'm working on a middleware that loads some resources at `spider_opened` handler method. If those resources can't be loaded, I need the spider to be closed.
I tried to do that by raising …
-
Hello
I am trying to create a Glue Crawler using JDBC driver jt400-20.0.7.jar (obtained from https://mvnrepository.com/artifact/net.sf.jt400/jt400), and getting error "Unsupported jdbc driver class…
-
The data structure of the crawler should be reconsidered and possibly adjusted. Currently, it breaks compatibility with existing crawlers from other implementations from the Inner Source Commons group…
-
Create a chapter introducing custom crawls on Data Together
Sections:
1. What is custom crawling?
- [ ] Why do some websites need custom crawls?
- [ ] What should your custom crawler extract fr…
-
# Improving the freva-rest API
In the existing freva system, users are able to add custom metadata to Apache Solr, which has been quite popular. To enhance this functionality in the new freva-rest AP…
-
```
Provide RSS integration feature to the crawler.
RSS Integration will allow for,
1. As a trigger to start/restart website crawling/indexing based on RSS
feed updates.
2. To implement an RSS…
-
```
Provide RSS integration feature to the crawler.
RSS Integration will allow for,
1. As a trigger to start/restart website crawling/indexing based on RSS
feed updates.
2. To implement an RSS…
-
**Is your feature request related to a problem? Please describe.**
When running fscrawler on an existing index, it overwrites any existing document with the same ID.
For example, if I add custom m…
-
## Bug Report
**Current Behavior**
When running "vendor/bin/typo3 crawler:processQueue" I get this error:
PHP Warning: include(/var/www/html/vendor/tomasnorre/crawler/cli/bootstrap.php/index.php…