Committer cannot connect to Elasticsearch

pcolmer commented 7 years ago

I'm trying to use the committer in conjunction with the AWS Elasticsearch Service. I've configured the AWS instance to grant full access to the IP address being used by the machine running the committer but when the software gets to the point where it is trying to commit documents, I get this error:

ERROR [AbstractCrawler] Wiki Crawler: Could not process document: https://wiki.linaro.org/FrontPage (None of the configured nodes are available: [{#transport#-1}{52.55.65.171}{search-websites-uzjmeau3ffjrauoeew5ow3lxkq.us-east-1.es.amazonaws.com/52.55.65.171:9300}]) NoNodeAvailableException[None of the configured nodes are available: [{#transport#-1}{52.55.65.171}{search-websites-uzjmeau3ffjrauoeew5ow3lxkq.us-east-1.es.amazonaws.com/52.55.65.171:9300}]] at org.elasticsearch.client.transport.TransportClientNodesService.ensureNodesAreAvailable(TransportClientNodesService.java:290) at org.elasticsearch.client.transport.TransportClientNodesService.execute(TransportClientNodesService.java:207) at org.elasticsearch.client.transport.support.TransportProxyClient.execute(TransportProxyClient.java:55) at org.elasticsearch.client.transport.TransportClient.doExecute(TransportClient.java:288) at org.elasticsearch.client.support.AbstractClient.execute(AbstractClient.java:359) at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:86) at org.elasticsearch.action.ActionRequestBuilder.execute(ActionRequestBuilder.java:56) at com.norconex.committer.elasticsearch.ElasticsearchCommitter.sendBulkToES(ElasticsearchCommitter.java:329) at com.norconex.committer.elasticsearch.ElasticsearchCommitter.bulkAddedDocuments(ElasticsearchCommitter.java:288) at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:257) at com.norconex.committer.core.AbstractBatchCommitter.commitAndCleanBatch(AbstractBatchCommitter.java:179) at com.norconex.committer.core.AbstractBatchCommitter.cacheOperationAndCommitIfReady(AbstractBatchCommitter.java:208) at com.norconex.committer.core.AbstractBatchCommitter.commitAddition(AbstractBatchCommitter.java:143) at com.norconex.committer.core.AbstractFileQueueCommitter.commit(AbstractFileQueueCommitter.java:222) at com.norconex.committer.core.AbstractCommitter.commitIfReady(AbstractCommitter.java:146) at com.norconex.committer.core.AbstractCommitter.add(AbstractCommitter.java:97) at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:34) at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:27) at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91) at com.norconex.collector.http.crawler.HttpCrawler.executeCommitterPipeline(HttpCrawler.java:354) at com.norconex.collector.core.crawler.AbstractCrawler.processImportResponse(AbstractCrawler.java:549) at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:506) at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:390) at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:771) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:745)

I've got the committer configured to use the transport client (and temporarily configured to commit frequently):

<committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
    <indexName>webpages</indexName>
    <typeName>webpage</typeName>
    <clusterHosts nodeClient="false">aws-fqdn</clusterHosts>
    <queueSize>1</queueSize>
    <commitBatchSize>1</commitBatchSize>
</committer>

I've tried switching to the node client but I then get an error about not being able to load mustache. I'm also uncertain about what I have to put into bindIp. If it is the IP address of the Elasticsearch server, AWS actually provides two IP addresses (presumably for load balancing) so I'm not sure how that is supposed to work.

essiembre commented 7 years ago

Hello, have you made progress since you posted your issue? Were you able to test the connection directly from the host where the collector is installed? For instance, do you get an error with this: telnet 52.55.65.171 9300

That will confirm whether the issue needs to be addressed within the Committer or not.

pcolmer commented 7 years ago

Thank you for the suggestion. telnet doesn't work, so it is a problem with the access rules on AWS Elasticsearch. I think I'm going to give up on that service and just build my own cluster.

Norconex / committer-elasticsearch

Committer cannot connect to Elasticsearch #6