-
Custom User-Agents help people filter you out, find out about what the bot is doing, and get in touch with you if the bot starts misbehaving, so it's good practice to use them. I'm thinking we put thi…
JeniT updated
11 years ago
-
Hi folks, I came across this project right now. I just release Nutch 2.3.1 a few weeks back and also Gora 0.7 is about to be released so you will be able to take advantage of some good updates and big…
-
I tried with this repository and followed all the steps. Still it is not generating any doc or index in ElasticSearch.
Here is the output (ES index name is nutch, cluster name is nutch)
...
Link…
-
我按照一文操作,提到将ivy的源换为http://maven.oschina.net/content/groups/public/ 源,按照此进行操作会出错,不换源执行ant eclipse -verbose则成功。是不是oschina的源不全导致的?
ghost updated
8 years ago
-
Hello,
When I execute NutchIndexing, `prepare.sh` runs successfully, and generates input folders: `crawldb`, `indexes`, `linkdb`, `segments`. However, when I execute `run.sh`, all tasks fail with the…
-
Following the instructions and everything seems to be working fine. MongoDB has `webpage` collection, but for some reason `elasticsearch` indexing doesn't do anything.
Ideas?
```
root@1e9a816bd8d9:/…
-
Port SpellCheckedMetadata from Nutch to cater for variations returned by servers
-
Similar to the BasicURLNormalizer in Nutch with maybe more later on.
Related to #24
-
Hi @Meabed,
One of our driving issues for the [Nutch 2.X roadmap](https://wiki.apache.org/nutch/Nutch2Roadmap) is the provisioning of Docker containers for various Gora backends.
I wonder if you would…
-
```
It is my whishlist :-)
Please, can you include these two classes in your engine. To ease the URL
filtering process. A take this from nutch package and changed this a bit to fit
my needs (initi…