-
Wikipedia has a lot of articles on f/oss projects with structured actual information, including versions. There should be a way to get a dump of this.
https://en.m.wikipedia.org/w/index.php?title=Cat…
-
There should be support for web page crawling, for instance to integrate the corporate intranet into the search.
-
Hi @vkuzel , I really appreciate your effort in open-sourcing this repo. I know that it's been 4 years since you created this. But this question should be easy for you to respond
The problem: I clo…
-
I have add the list to my htaccess website file, for Apache 2.4 :
https://github.com/mitchellkrogza/apache-ultimate-bad-bot-blocker/blob/master/_htaccess_versions/htaccess-mod_rewrite.txt
To my se…
-
Originally reported on Google Code with ID 55
```
As reported by PMD:
"Use String.indexOf(char) when checking for the index of a single character; it executes
faster."
Seeing the source code from th…
-
using apache hadoop to store and compute data in distributed system
-
https://github.com/apache/nutch/commit/c93d908bb635d3c5b59f8c8a22e0584ebf588794
-
Trying to read a WARC file which has an info header results in read failure. I followed the steps as:
Using spark 2.3.1, scala shell. Downloaded the aut-0.16.1-SNAPSHOT-fatjar.jar and used the --ja…
-
Hi.
I've just tested the plugin with nutch 1.9, i used the patch in NUTCH-1933, it work well when i test with http urls, but i get
fetch of https://wiki.apache.org/nutch/HttpAuthenticationSchemes fa…
moees updated
6 years ago
-
If you need to analyze the root cause of a query's failure to match some document, you can use the Weight.explain() API. If you want to do some gross analysis of a whole batch of queries, say scraped …