USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
410 stars 143 forks source link

Solr Cloud - solrj.SolrServerException: No live SolrServers available to handle this request #66

Closed thammegowda closed 7 years ago

thammegowda commented 7 years ago

When solr cloud is enabled for backend, we get this

Exception in thread "main" java.lang.reflect.InvocationTargetException
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at edu.usc.irds.sparkler.Main$.main(Main.scala:47)
    at edu.usc.irds.sparkler.Main.main(Main.scala)
Caused by: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error from server at http://localhost:8983/solr/crawldb: org.apache.solr.client.solrj.SolrServerException: No live SolrServers available to handle this request:[http://192.168.0.11:8983/solr/crawldb_shard1_replica1, http://192.168.0.11:8984/solr/crawldb_shard1_replica2]
    at org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:577)
    at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
    at org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
    at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:149)
    at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:942)
    at org.apache.solr.client.solrj.SolrClient.query(SolrClient.java:957)
    at edu.usc.irds.sparkler.CrawlDbRDD.getPartitions(CrawlDbRDD.scala:72)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:239)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:237)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:237)
    at org.apache.spark.Partitioner$.defaultPartitioner(Partitioner.scala:65)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$3.apply(PairRDDFunctions.scala:642)
    at org.apache.spark.rdd.PairRDDFunctions$$anonfun$groupByKey$3.apply(PairRDDFunctions.scala:642)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111)
    at org.apache.spark.rdd.RDD.withScope(RDD.scala:316)
    at org.apache.spark.rdd.PairRDDFunctions.groupByKey(PairRDDFunctions.scala:641)
    at edu.usc.irds.sparkler.pipeline.Crawler$$anonfun$run$1.apply$mcVI$sp(Crawler.scala:153)
    at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:160)
    at edu.usc.irds.sparkler.pipeline.Crawler.run(Crawler.scala:145)
    at edu.usc.irds.sparkler.base.CliTool$class.run(CliTool.scala:34)
    at edu.usc.irds.sparkler.pipeline.Crawler.run(Crawler.scala:45)
    at edu.usc.irds.sparkler.pipeline.Crawler$.main(Crawler.scala:236)
    at edu.usc.irds.sparkler.pipeline.Crawler.main(Crawler.scala)
    ... 6 more
thammegowda commented 7 years ago

This is due to a bug in Solr Cloud https://issues.apache.org/jira/browse/SOLR-4164

It is fixed in Solr 6.3. Its our time to upgrade Solr to latest release 6.4.0