USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411 stars 143 forks source link

Cannot talk to ZooKeeper - Updates are disabled #75

Closed karanjeets closed 6 years ago

karanjeets commented 7 years ago

Occurs for long running crawls and/or Solr Cloud running for a long time. It could be related to the unresolved issue https://issues.apache.org/jira/browse/SOLR-3274.

Logs

17/02/05 15:02:09 WARN scheduler.TaskSetManager: Lost task 57.0 in stage 22.0 (TID 4121, server): org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error f
rom server at http://server:8983/solr/crawldb: Cannot talk to ZooKeeper - Updates are disabled.
    at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:610)
    at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:279)
    at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:268)
    at org.apache.solr.client.solrj.impl.LBHttpSolrClient.doRequest(LBHttpSolrClient.java:435)
    at org.apache.solr.client.solrj.impl.LBHttpSolrClient.request(LBHttpSolrClient.java:387)
    at org.apache.solr.client.solrj.impl.CloudSolrClient.sendRequest(CloudSolrClient.java:1344)
    at org.apache.solr.client.solrj.impl.CloudSolrClient.requestWithRetryOnStaleState(CloudSolrClient.java:1095)
    at org.apache.solr.client.solrj.impl.CloudSolrClient.request(CloudSolrClient.java:1037)
    at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
    at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:209)
    at org.apache.solr.client.solrj.SolrClient.add(SolrClient.java:224)
    at edu.usc.irds.sparkler.service.SolrProxy.addResourceDocs(SolrProxy.scala:35)
    at edu.usc.irds.sparkler.solr.SolrStatusUpdate.apply(SolrStatusUpdate.scala:37)
    at edu.usc.irds.sparkler.solr.SolrStatusUpdate.apply(SolrStatusUpdate.scala:32)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:227)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)