Caused by: java.io.IOException: listener timeout after waiting for [30000] ms

jmrichardson commented 7 years ago

Hi,

I ran into an issue with what I believe are large files that cause listener timeout errors when committing to ES. I believe the ES java client defaults to 30 seconds before giving up which is what I believe is happening. Can you provide a way to increase the timeout as I am unable to commit the remaining large files to ES? I think this may be the way to increase the timeouts. Here is the error I am receiving:

INFO  [ElasticsearchCommitter] Sending 50 commit operations to Elasticsearch.
ERROR [ElasticsearchCommitter$1] Failure occured on node: "http://localhost:9200". Check node logs.
INFO  [ElasticsearchCommitter] Elasticsearch RestClient closed.
INFO  [AbstractCrawler] WM Search Elastic: Crawler executed in 1 minute 30 seconds.
ERROR [JobSuite] Execution failed for job: WM Search Elastic
com.norconex.committer.core.CommitterException: Could not commit JSON batch to Elasticsearch.
        at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:489)
        at com.norconex.committer.core.AbstractBatchCommitter.commitAndCleanBatch(AbstractBatchCommitter.java:179)
        at com.norconex.committer.core.AbstractBatchCommitter.cacheOperationAndCommitIfReady(AbstractBatchCommitter.java:208)
        at com.norconex.committer.core.AbstractBatchCommitter.commitAddition(AbstractBatchCommitter.java:143)
        at com.norconex.committer.core.AbstractFileQueueCommitter.commit(AbstractFileQueueCommitter.java:222)
        at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commit(ElasticsearchCommitter.java:427)
        at com.norconex.collector.core.crawler.AbstractCrawler.execute(AbstractCrawler.java:274)
        at com.norconex.collector.core.crawler.AbstractCrawler.doExecute(AbstractCrawler.java:228)
        at com.norconex.collector.core.crawler.AbstractCrawler.startExecution(AbstractCrawler.java:184)
        at com.norconex.jef4.job.AbstractResumableJob.execute(AbstractResumableJob.java:49)
        at com.norconex.jef4.suite.JobSuite.runJob(JobSuite.java:355)
        at com.norconex.jef4.suite.JobSuite.doExecute(JobSuite.java:296)
        at com.norconex.jef4.suite.JobSuite.execute(JobSuite.java:168)
        at com.norconex.collector.core.AbstractCollector.start(AbstractCollector.java:132)
        at com.norconex.collector.core.AbstractCollectorLauncher.launch(AbstractCollectorLauncher.java:95)
        at com.norconex.collector.fs.FilesystemCollector.main(FilesystemCollector.java:76)
Caused by: java.io.IOException: listener timeout after waiting for [30000] ms
        at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:660)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:219)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:191)
        at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:480)
        ... 15 more
INFO  [JobSuite] Running WM Search Elastic: END (Sun Oct 29 10:52:20 EDT 2017)

On a side note, I also was receiving these errors that I was able to circumvent by decreasing the number of commits to 50 instead of 100:

WM Search: 2017-10-26 18:57:28 FATAL - WM Search: An error occured that could compromise the stability of the crawler. Stopping excution to avoid further issues...
com.norconex.committer.core.CommitterException: Could not commit JSON batch to Elasticsearch.
  at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:489)
  at com.norconex.committer.core.AbstractBatchCommitter.commitAndCleanBatch(AbstractBatchCommitter.java:179)
  at com.norconex.committer.core.AbstractBatchCommitter.cacheOperationAndCommitIfReady(AbstractBatchCommitter.java:208)
  at com.norconex.committer.core.AbstractBatchCommitter.commitAddition(AbstractBatchCommitter.java:143)
  at com.norconex.committer.core.AbstractFileQueueCommitter.commit(AbstractFileQueueCommitter.java:222)
  at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commit(ElasticsearchCommitter.java:427)
  at com.norconex.committer.core.AbstractCommitter.commitIfReady(AbstractCommitter.java:146)
  at com.norconex.committer.core.AbstractCommitter.add(AbstractCommitter.java:97)
  at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:34)
  at com.norconex.collector.core.pipeline.committer.CommitModuleStage.execute(CommitModuleStage.java:27)
  at com.norconex.commons.lang.pipeline.Pipeline.execute(Pipeline.java:91)
  at com.norconex.collector.fs.crawler.FilesystemCrawler.executeCommitterPipeline(FilesystemCrawler.java:243)
  at com.norconex.collector.core.crawler.AbstractCrawler.processImportResponse(AbstractCrawler.java:595)
  at com.norconex.collector.core.crawler.AbstractCrawler.processNextQueuedCrawlData(AbstractCrawler.java:541)
  at com.norconex.collector.core.crawler.AbstractCrawler.processNextReference(AbstractCrawler.java:419)
  at com.norconex.collector.core.crawler.AbstractCrawler$ProcessReferencesRunnable.run(AbstractCrawler.java:812)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.http.ConnectionClosedException: Connection closed unexpectedly
  at org.apache.http.nio.protocol.HttpAsyncRequestExecutor.closed(HttpAsyncRequestExecutor.java:140)
  at org.apache.http.impl.nio.client.InternalIODispatch.onClosed(InternalIODispatch.java:71)
  at org.apache.http.impl.nio.client.InternalIODispatch.onClosed(InternalIODispatch.java:39)
  at org.apache.http.impl.nio.reactor.AbstractIODispatch.disconnected(AbstractIODispatch.java:100)
  at org.apache.http.impl.nio.reactor.BaseIOReactor.sessionClosed(BaseIOReactor.java:279)
  at org.apache.http.impl.nio.reactor.AbstractIOReactor.processClosedSessions(AbstractIOReactor.java:440)
  at org.apache.http.impl.nio.reactor.AbstractIOReactor.execute(AbstractIOReactor.java:283)
  at org.apache.http.impl.nio.reactor.BaseIOReactor.execute(BaseIOReactor.java:104)
  at org.apache.http.impl.nio.reactor.AbstractMultiworkerIOReactor$Worker.run(AbstractMultiworkerIOReactor.java:588)
  ... 1 more
WM Search: 2017-10-26 18:57:28 INFO -          CRAWLER_STOPPING

I am wondering if there is a bulk setting parameter that you can also tweak to enable large transfers. Referencing someone's resolution after googling:

I faced the same issue and finally the issue got resolved by the use of request_timeout parameter instead of timeout. 

So the call must be like this helpers.bulk(es,actions,chunk_size=some_value,request_timeout=some_value)

Thanks for your help again :)

essiembre commented 7 years ago

A new snapshot release was just made, adding the following config options:

      <connectionTimeout>(milliseconds)</connectionTimeout>
      <socketTimeout>(milliseconds)</socketTimeout>

You can use plain Eglish for the milliseconds (e.g., 5 minutes, 30 seconds, etc.).

Please have a try and confirm.

jmrichardson commented 7 years ago

Hi, sorry for the delay. I had a long running collector job running and didn't want to cancel it. Unfortunately, I am still getting the same error:

INFO  [AbstractCollectorConfig] Configuration loaded: id=Text Files; logsDir=/home/es/elastic/ingest/norconex/workdir-clients/logs; progressDir=/home/es/elastic/ingest/norconex/workdir-clients/progress
INFO  [JobSuite] JEF work directory is: /home/es/elastic/ingest/norconex/workdir-clients/progress
INFO  [JobSuite] JEF log manager is : FileLogManager
INFO  [JobSuite] JEF job status store is : FileJobStatusStore
INFO  [AbstractCollector] Suite of 1 crawler jobs created.
INFO  [JobSuite] Initialization...
INFO  [JobSuite] Previous execution detected.
INFO  [JobSuite] Backing up previous execution status and log files.
INFO  [JobSuite] Starting execution.
INFO  [AbstractCollector] Version: Norconex Filesystem Collector 2.7.2-SNAPSHOT (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex Collector Core 1.9.0-SNAPSHOT (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex Importer 2.8.0-SNAPSHOT (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex JEF 4.1.0 (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex Committer Core 2.1.2-SNAPSHOT (Norconex Inc.)
INFO  [AbstractCollector] Version: Norconex Committer Elasticsearch 4.1.0-SNAPSHOT (Norconex Inc.)
INFO  [JobSuite] Running WM Search Elastic: BEGIN (Tue Oct 31 16:04:23 EDT 2017)
INFO  [FilesystemCrawler] 0 start paths identified.
INFO  [CrawlerEventManager]           CRAWLER_STARTED
INFO  [AbstractCrawler] WM Search Elastic: Crawling references...
INFO  [AbstractCrawler] WM Search Elastic: Reprocessing any cached/orphan references...
INFO  [AbstractCrawler] WM Search Elastic: Crawler finishing: committing documents.
INFO  [AbstractFileQueueCommitter] Committing 1000 files
INFO  [ElasticsearchCommitter] Sending 50 commit operations to Elasticsearch.
ERROR [ElasticsearchCommitter$1] Failure occured on node: "http://localhost:9200". Check node logs.
INFO  [ElasticsearchCommitter] Elasticsearch RestClient closed.
ERROR [AbstractBatchCommitter] Could not commit batched operations.
com.norconex.committer.core.CommitterException: Could not commit JSON batch to Elasticsearch.
        at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:534)
        at com.norconex.committer.core.AbstractBatchCommitter.commitAndCleanBatch(AbstractBatchCommitter.java:179)
        at com.norconex.committer.core.AbstractBatchCommitter.cacheOperationAndCommitIfReady(AbstractBatchCommitter.java:208)
        at com.norconex.committer.core.AbstractBatchCommitter.commitAddition(AbstractBatchCommitter.java:143)
        at com.norconex.committer.core.AbstractFileQueueCommitter.commit(AbstractFileQueueCommitter.java:222)
        at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commit(ElasticsearchCommitter.java:472)
        at com.norconex.collector.core.crawler.AbstractCrawler.execute(AbstractCrawler.java:274)
        at com.norconex.collector.core.crawler.AbstractCrawler.doExecute(AbstractCrawler.java:228)
        at com.norconex.collector.core.crawler.AbstractCrawler.startExecution(AbstractCrawler.java:184)
        at com.norconex.jef4.job.AbstractResumableJob.execute(AbstractResumableJob.java:49)
        at com.norconex.jef4.suite.JobSuite.runJob(JobSuite.java:355)
        at com.norconex.jef4.suite.JobSuite.doExecute(JobSuite.java:296)
        at com.norconex.jef4.suite.JobSuite.execute(JobSuite.java:168)
        at com.norconex.collector.core.AbstractCollector.start(AbstractCollector.java:132)
        at com.norconex.collector.core.AbstractCollectorLauncher.launch(AbstractCollectorLauncher.java:95)
        at com.norconex.collector.fs.FilesystemCollector.main(FilesystemCollector.java:76)
Caused by: java.io.IOException: listener timeout after waiting for [30000] ms
        at org.elasticsearch.client.RestClient$SyncResponseListener.get(RestClient.java:660)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:219)
        at org.elasticsearch.client.RestClient.performRequest(RestClient.java:191)
        at com.norconex.committer.elasticsearch.ElasticsearchCommitter.commitBatch(ElasticsearchCommitter.java:525)
        ... 15 more

Here is my config xml:

<fscollector id="Text Files">

  <logsDir>/home/es/elastic/ingest/norconex/workdir-clients/logs</logsDir>
  <progressDir>/home/es/elastic/ingest/norconex/workdir-clients/progress</progressDir>

  <crawlers>
    <crawler id="WM Search Elastic">

      <workDir>/home/es/elastic/ingest/norconex/workdir-clients</workDir>
      <numThreads>1</numThreads>
      <keepDownloads>false</keepDownloads>

      <committer class="com.norconex.committer.elasticsearch.ElasticsearchCommitter">
        <nodes>http://localhost:9200</nodes>
        <indexName>wmsearch</indexName>
        <queueDir>/home/es/elastic/ingest/norconex/workdir-clients/commit</queueDir>
        <jsonFieldsPattern>scope</jsonFieldsPattern>
        <connectionTimeout>5 minutes</connectionTimeout>
        <socketTimeout>5 minutes</socketTimeout>
        <typeName>Documents</typeName>
        <commitBatchSize>50</commitBatchSize>
        <maxRetries>1</maxRetries>
      </committer>

    </crawler>
  </crawlers>

</fscollector>

Note that I am just committing the remaining documents in the queue (not crawling) as there were too many to reprocess for a test. It looks like it's not picking up the "5 minutes" for the timeouts in the above configuration.

Here is how I ran the test:

/home/es/elastic/norconex/collector-fs.sh -a resume -c /home/es/elastic/ingest/norconex/config/config-clients.xml

Let me know if you need more detail. Thanks

essiembre commented 7 years ago

Strange, it has been implemented as per the link you provided. This will require a bit more investigation. Do you have other related messages in your Elasticsearch logs?

essiembre commented 7 years ago

It turns out I misinterpreted what the maxRetryTimeout was for in ES REST API client. I left it out wrongfully thinking it was duplicating the behavior of maxRetryWait that is common to most Committers. So I added it, and there is now a new configuration option called <maxRetryTimeout> in the latest snapshot release.

Please give it a try and confirm.

jmrichardson commented 7 years ago

Yay! That worked :) Thank you so much. I think I am getting close to having all the issues worked out. I need to purchase so more SSDs because I am running out of space for the indexes. So, once they come in I expect to be able to ingest all the files successfully. Will let you know if I run into any other issues.

Norconex / committer-elasticsearch

Caused by: java.io.IOException: listener timeout after waiting for [30000] ms #22