camilotejeiro / camilotejeiro.github.io

Camilo Tejeiro Blog
https://camilotejeiro.github.io/
0 stars 0 forks source link

Nutch1 Quick Tutorial, Learning to Crawl #3

Open camilotejeiro opened 7 years ago

camilotejeiro commented 7 years ago

I do not own these comments, these were copied from my old Wordpress.com blog verbatim, in case it helps other readers.

Author: karthik

Hi, I have the situation to integrate solr nutch in drupal 7 I have integrated the solr-4.10.4 with drupal 7 through module the search operation works fine with the apache solr search(module) that available in drupal7. the Point is to fetch the hyper links that are available on the page. so that i found apache nutch is fine. but i have configured the Solr in drupal with the following change of files in solr. 1)schema.xml 2)solrconfig.xml 3)protwords.xml from drupal module.

how to connect all these solr4.10.4 nutch1.12 and drupal7 kindly help in this.

camilotejeiro commented 7 years ago

I do not own these comments, these were copied from my old Wordpress.com blog verbatim, in case it helps other readers.

Author: Perminder Singh

Everything works fine apart from indexing this output comes and there in nothing in elastic search

Elastic Version: 1.7.2 Nutch 1.13

Indexer: starting at 2017-06-12 13:42:24 Indexer: deleting gone documents: false Indexer: URL filtering: false Indexer: URL normalizing: false Active IndexWriters : ElasticIndexWriter elastic.cluster : elastic prefix cluster elastic.host : hostname elastic.port : port elastic.index : elastic index command elastic.max.bulk.docs : elastic bulk index doc counts. (default 250) elastic.max.bulk.size : elastic bulk index length in bytes. (default 2500500) elastic.exponential.backoff.millis : elastic bulk exponential backoff initial delay in milliseconds. (default 100) elastic.exponential.backoff.retries : elastic bulk exponential backoff max retries. (default 10) elastic.bulk.close.timeout : elastic timeout for the last bulk in seconds. (default 600)

Indexer: number of documents indexed, deleted, or skipped: Indexer: finished at 2017-06-12 13:42:41, elapsed: 00:00:17