stormcrawler Search Results

178 results
for stormcrawler

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

apache/incubator-stormcrawler #405

org.apache.http.NoHttpResponseException: The target server f…

I have a lot of FETCH_ERROR (about ten percent on one million french url). On debug i can see this error : org.apache.http.NoHttpResponseException: The target server failed to respond Sometimes, it'…

Laurent-Hervaud updated 4 years ago
12
apache/incubator-stormcrawler #769

java.lang.ClassCastException: clojure.lang.PersistentVector …

I follow tutorial on http://stormcrawler.net/getting-started/ by watching youtube. After I type fallowing line on my command line I got error on my spout storm jar target/tutorial-1.0-SNA…

billiamchoi updated 4 years ago
5
apache/incubator-stormcrawler #756

okhttp protocol: trimmed content because of content limit no…

(see [NUTCH-2729](https://issues.apache.org/jira/browse/NUTCH-2729) and commoncrawl/nutch#10 for the same issue in Nutch) The marking of trimmed content (by content limit) is not reliable and repro…

sebastian-nagel updated 5 years ago
1
31z4/storm-docker #1

java.net.UnknownHostException using docker-compose

Hi there! Thanks you for creating this project! It was just what I was looking for to test upgrading to storm 1.0.1. I've copied your docker-compose configuration and it seems to be running, but I a…

mzbyszynski updated 5 years ago
6
apache/incubator-stormcrawler #730

ESSeedInjector topology does not index seeds into Elasticsea…

**The used environment:** - Default ES cluster with Kibana deployed on docker swarm. Both in 7.0.1 version - Step by step creation of the topology based on your [guide](https://github.com/DigitalPeb…

pgg-are-my-initials updated 5 years ago
3
commoncrawl/news-crawl #28

Endless refetch of URLs due to changing domain names

The news crawler uses the domain name to manage fetch queues. The domain name is also used to route URLs to Elasticsearch shards. When a URL is re-fetched the existing routing key isn't reused, instea…

sebastian-nagel updated 5 years ago
2
apache/incubator-stormcrawler #720

NPE in WARCHdfsBolt on cleanup()

When a (local) topology is killed and no tuples have been passed to the WARCHdfsBolt, the cleanup() will raise a NPE: ``` 68227 [Thread-91-warc-executor[36 36]] INFO o.a.s.util - Async loop interru…

sebastian-nagel updated 5 years ago
5
commoncrawl/news-crawl #27

Error: Could not find or load main class com.digitalpebble.s…

Hi, I Just Follow the Readme.. I Create the Uber Jar Using the mvn clean package but i am getting this error. Error: Could not find or load main class com.digitalpebble.stormcrawler.elasticsear…

rishrockstar updated 5 years ago
2
crawler-commons/crawler-commons #237

Usage of SiteMapParser

Hi, this might be a silly question, but still. I noticed that `SiteMapParser.parseSiteMap()` returns `AbstractSiteMap`, can you give me some examples of how this is intended to be used? Thank…

pr3mar updated 5 years ago
2
apache/incubator-stormcrawler #710

Fix the logic around sitemap = false

#645 was a good idea in theory but needs fixing. The idea was to prevent pages from having their outlinks followed unless they had been flagged as being a sitemap (or not), basically, we have sitemaps…

jnioche updated 5 years ago
1

上一页 1...9 10 11 12 13 14 15...18 下一页

178 results for stormcrawler

178 results
for stormcrawler