codelibs / elasticsearch-river-web

Web Crawler for Elasticsearch
Apache License 2.0
234 stars 57 forks source link

Crawler is connecting then disconnecting?? #128

Open osmanra2 opened 7 years ago

osmanra2 commented 7 years ago

Using river web 2.4.0

My es version:

{ name: "LinuxGrants", cluster_name: "LinuxOER", version: { number: "2.4.0", build_hash: "ce9f0c7394dee074091dd1bc4e9469251181fc55", build_timestamp: "2016-08-29T09:14:17Z", build_snapshot: false, lucene_version: "5.5.2" }, tagline: "You Know, for Search" }

Log shows:

2017-06-27 12:18:34,478 [main] INFO Connected to xxx.xxx.xxx.xxx:9300 2017-06-27 12:18:34,712 [Crawler-836317c4-95e9-485a-9c1a-935b2dea7117-1] INFO Crawling URL: http://www.xxxxxxxxxxxx.com/ 2017-06-27 12:18:34,747 [Crawler-836317c4-95e9-485a-9c1a-935b2dea7117-1] INFO Checking URL: http://www.xxxxxxxxxxx.com/robots.txt 2017-06-27 12:18:34,809 [Crawler-836317c4-95e9-485a-9c1a-935b2dea7117-1] INFO Redirect to URL: http://www.xxxxxx.com/ 2017-06-27 12:19:06,012 [Thread-0] INFO Disconnected to LinuxOER: xxx.xxx.xxx.xxx:9300

I tried changing to https and still the same thing. Any help?

marevol commented 7 years ago

What is the crawl config?