apache-nutch Search Results

353 results
for apache-nutch

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

tgoetz/crawler-commons #22

Use longest-match-wins approach to matching URLs in robots.t…

``` See "Order of precedence for group-member records" section at the end of https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt ``` Original issue reported on code.google…

GoogleCodeExporter updated 9 years ago
2
Intel-bigdata/HiBench #6

how to rebuild nutch-1.2.jar??

Hi , I want to modify IndexingMapReduce.java file from nutch-indexing, but I'm not ale to recompile it back to the nutch-1.2.jar file. When the ran the provided build.xml file, it complained that it …

NTNguyen updated 9 years ago
1
khuongduyit/crawler4j #136

JVM crash when running crawler on Centos 6.2

``` What steps will reproduce the problem? Running the crawler crashes the JVM some times. I crawl around 10 web sites regularly with pages between 1K to 50K. This happens randomly but happens very …

GoogleCodeExporter updated 9 years ago
14
tasfe/crawler-commons #22

Use longest-match-wins approach to matching URLs in robots.t…

``` See "Order of precedence for group-member records" section at the end of https://developers.google.com/webmasters/control-crawl-index/docs/robots_txt ``` Original issue reported on code.google…

GoogleCodeExporter updated 9 years ago
2
Intel-bigdata/HiBench #46

Launching nutchindexing on CDH5

Hi. I am using CDH5 5.0.2 which is latest. and I have downloaded latest HiBench source. All benchmark suites(wordcount, terasort, kmeans, hive bench, etc.) operate pretty well except nutchindexing.…

Jaeki updated 9 years ago
7
BayanGroup/nutch-custom-search #15

NPE trying to index

I am trying to use extractor as a html/index filter but I am getting a NPE when its trying to load the config file despite the fact that I have an extractors.xml file in the conf directory. Here is th…

dmnt3rr0r updated 9 years ago
2
jprante/elasticsearch-knapsack #68

export fails if _id contains encoded url

I'm using the Apache Nutch Crawler to index websites. The default behaviour is to use the url as the unique identifier, which seemed like a good idea until now. If the exported index contains fields …

chrseidel updated 9 years ago
1
xxqcheers/crawler4j #136

JVM crash when running crawler on Centos 6.2

``` What steps will reproduce the problem? Running the crawler crashes the JVM some times. I crawl around 10 web sites regularly with pages between 1K to 50K. This happens randomly but happens very …

GoogleCodeExporter updated 9 years ago
14
sageone/crawler4j #136

JVM crash when running crawler on Centos 6.2

``` What steps will reproduce the problem? Running the crawler crashes the JVM some times. I crawl around 10 web sites regularly with pages between 1K to 50K. This happens randomly but happens very …

GoogleCodeExporter updated 9 years ago
14
Intel-bigdata/HiBench #4

Benchmarking for nutchindexing

HI, I am trying to run the Nutchindexing job (https://github.com/hibench/HiBench-2.1/tree/f1d43780f5ae813ccd4e891e353429e7871c9c41). It says that "Total input paths to process is 0". Can anyone help m…

prashanthig updated 9 years ago
19

上一页 1...28 29 30 31 32 33 34...36 下一页

353 results for apache-nutch

353 results
for apache-nutch