-
The nutch conda package invokes a shell script that does some very funny Java hardcode things. This is fragile and we should get rid of it. Who is the maintainer of the nutch conda package?
-
I am running through the list of sources in the Kibble Demo that have not been processed and are showing the error message
```
Could not sync with source
Exception: No default branch was found …
-
I've been testing out scrapix and first off, awesome work! With a little bit of tinkering around I got it working with meilisearch cloud FAST!
That said, it could be useful to add an option to rat…
klvs updated
4 months ago
-
See [NUTCH-2946](https://issues.apache.org/jira/browse/NUTCH-2946)
> The fetcher holds for every fetch queue a counter which counts the number of observed "exceptions" seen when fetching from the h…
-
```
The scenario is running the SitemapTester on a sitemapIndex in a GZ file.
This is the current method stamp of the sitemapParser:
parse(URL url, String mt, boolean recursive)
As you see, the sec…
-
I get an error in the Crawl log after starting the crawl as follows:
~/miniconda3/envs/memex/lib/nutch ~/memex-explorer/source
Injecting seed URLs
/home/salonee/miniconda3/envs/memex/lib/nutch/bin/nut…
-
Hello,
I've been using hadoop and Hibench for 2,5 months and I have experienced some problems as I was working with this. Now, it looks that everything is ok and all the benchmarks run BUT I still h…
-
At the moment, bsbang-crawl does a very hokey top-level crawl of the JSON-LD captured. This only captures a very small amount of information, mainly because this was for proof of concept and even cra…
-
```
The scenario is running the SitemapTester on a sitemapIndex in a GZ file.
This is the current method stamp of the sitemapParser:
parse(URL url, String mt, boolean recursive)
As you see, the sec…
-
```
The scenario is running the SitemapTester on a sitemapIndex in a GZ file.
This is the current method stamp of the sitemapParser:
parse(URL url, String mt, boolean recursive)
As you see, the sec…