apache-nutch Search Results

353 results
for apache-nutch

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

nasa-jpl-memex/memex-explorer #531

nutch conda package is broken

The nutch conda package invokes a shell script that does some very funny Java hardcode things. This is fragile and we should get rid of it. Who is the maintainer of the nutch conda package?

ahmadia updated 9 years ago
12
apache/kibble-scanners #7

Not picking up default branch on repos that have a non stand…

I am running through the list of sources in the Kibble Demo that have not been processed and are showing the error message ``` Could not sync with source Exception: No default branch was found …

sharanf updated 2 years ago
2
meilisearch/scrapix #99

Provide option to slow or rate limit requests

I've been testing out scrapix and first off, awesome work! With a little bit of tinkering around I got it working with meilisearch cloud FAST! That said, it could be useful to add an option to rat…

klvs updated 4 months ago
1
apache/incubator-stormcrawler #1106

Fetcher: optionally slow down fetching from hosts with repea…

See [NUTCH-2946](https://issues.apache.org/jira/browse/NUTCH-2946) > The fetcher holds for every fetch queue a counter which counts the number of observed "exceptions" seen when fetching from the h…

jnioche updated 9 months ago
2
AgenteFarron/crawler-commons #43

[Sitemaps] Fix the Tester Util's Logic

``` The scenario is running the SitemapTester on a sitemapIndex in a GZ file. This is the current method stamp of the sitemapParser: parse(URL url, String mt, boolean recursive) As you see, the sec…

GoogleCodeExporter updated 8 years ago
11
nasa-jpl-memex/memex-explorer #707

Error crawling URLs

I get an error in the Crawl log after starting the crawl as follows: ~/miniconda3/envs/memex/lib/nutch ~/memex-explorer/source Injecting seed URLs /home/salonee/miniconda3/envs/memex/lib/nutch/bin/nut…

saloneerege updated 8 years ago
4
Intel-bigdata/HiBench #77

Encountered problems with Hibench and question about concurr…

Hello, I've been using hadoop and Hibench for 2,5 months and I have experienced some problems as I was working with this. Now, it looks that everything is ok and all the benchmarks run BUT I still h…

jforjohn updated 9 years ago
4
buzzbangorg/bsbang-crawler #4

Process crawled JSON-LD to multiple levels, possibly using a…

At the moment, bsbang-crawl does a very hokey top-level crawl of the JSON-LD captured. This only captures a very small amount of information, mainly because this was for proof of concept and even cra…

justinccdev updated 6 years ago
3
amir-jakoby/crawler-commons #43

[Sitemaps] Fix the Tester Util's Logic

``` The scenario is running the SitemapTester on a sitemapIndex in a GZ file. This is the current method stamp of the sitemapParser: parse(URL url, String mt, boolean recursive) As you see, the sec…

GoogleCodeExporter updated 8 years ago
11
ferhatsb/crawler-commons #43

[Sitemaps] Fix the Tester Util's Logic

``` The scenario is running the SitemapTester on a sitemapIndex in a GZ file. This is the current method stamp of the sitemapParser: parse(URL url, String mt, boolean recursive) As you see, the sec…

GoogleCodeExporter updated 9 years ago
11

上一页 1...4 5 6 7 8 9 10...36 下一页

353 results for apache-nutch

353 results
for apache-nutch