custom-crawler Search Results

1000+ results
for custom-crawler

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

scrapy/scrapy #4944

Document download handler interface

Scrapy has a [`DOWNLOAD_HANDLERS`](https://docs.scrapy.org/en/2.4/topics/settings.html#std-setting-DOWNLOAD_HANDLERS) setting that allows to customize the handlers for each schema, it would be good to…

elacuesta updated 1 year ago
2
scrapy/scrapy #3435

Is it possible to close the spider at spider_opened signal?

Hello, I'm working on a middleware that loads some resources at `spider_opened` handler method. If those resources can't be loaded, I need the spider to be closed. I tried to do that by raising …

pauloromeira updated 4 months ago
15
aws-samples/aws-glue-samples #155

Unsupported jdbc driver classname with com.ibm.as400.access.…

Hello I am trying to create a Glue Crawler using JDBC driver jt400-20.0.7.jar (obtained from https://mvnrepository.com/artifact/net.sf.jt400/jt400), and getting error "Unsupported jdbc driver class…

javicamarababel updated 4 months ago
1
philips-labs/github-portal #16

Data structure of Crawler

The data structure of the crawler should be reconsidered and possibly adjusted. Currently, it breaks compatibility with existing crawlers from other implementations from the Inner Source Commons group…

Brend-Smits updated 3 years ago
1
datatogether/learning #19

Custom Crawls Chapter

Create a chapter introducing custom crawls on Data Together Sections: 1. What is custom crawling? - [ ] Why do some websites need custom crawls? - [ ] What should your custom crawler extract fr…

jeffreyliu updated 7 years ago
1
FREVA-CLINT/freva-nextgen #32

Create PUT methods for the databrowser API

# Improving the freva-rest API In the existing freva system, users are able to add custom metadata to Apache Solr, which has been quite popular. To enhance this functionality in the new freva-rest AP…

antarcticrainforest updated 1 month ago
1
abhishekbhalani/harvestman-crawler #10

RSS Integration

``` Provide RSS integration feature to the crawler. RSS Integration will allow for, 1. As a trigger to start/restart website crawling/indexing based on RSS feed updates. 2. To implement an RSS…

GoogleCodeExporter updated 9 years ago
4
Averroes/harvestman-crawler #10

RSS Integration

``` Provide RSS integration feature to the crawler. RSS Integration will allow for, 1. As a trigger to start/restart website crawling/indexing based on RSS feed updates. 2. To implement an RSS…

GoogleCodeExporter updated 9 years ago
4
dadoonet/fscrawler #1867

Fscrawler always overwrites existing documents, with no opti…

**Is your feature request related to a problem? Please describe.** When running fscrawler on an existing index, it overwrites any existing document with the same ID. For example, if I add custom m…

acastin updated 4 months ago
5
tomasnorre/crawler #1023

Problem with hardcoded typo3conf/ext path inside bootstrap.p…

## Bug Report **Current Behavior** When running "vendor/bin/typo3 crawler:processQueue" I get this error: PHP Warning: include(/var/www/html/vendor/tomasnorre/crawler/cli/bootstrap.php/index.php…

ktallafus updated 3 months ago
5

上一页 1...2 3 4 5 6 7 8...100 下一页

1000+ results for custom-crawler

1000+ results
for custom-crawler