USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411 stars 143 forks source link

e.u.i.s.model.Resource.<init>(Resource.java:46) java.net.MalformedURLException: Stream handler unavailable due to: For input string: "0x6" #65

Closed thammegowda closed 7 years ago

thammegowda commented 7 years ago
17/01/29 12:43:30 INFO BlockManagerInfo: Removed broadcast_2_piece0 on localhost:64585 in memory (size: 2.8 KB, free: 2.4 GB)
17/01/29 12:43:31 ERROR Executor: Exception in task 7.0 in stage 4.0 (TID 67)
java.net.MalformedURLException: Stream handler unavailable due to: For input string: "0x6"
    at java.net.URL.<init>(URL.java:627)
    at java.net.URL.<init>(URL.java:490)
    at java.net.URL.<init>(URL.java:439)
    at edu.usc.irds.sparkler.model.Resource.<init>(Resource.java:46)
    at edu.usc.irds.sparkler.pipeline.OutLinkFilterFunc$$anonfun$apply$5.apply(Crawler.scala:204)
    at edu.usc.irds.sparkler.pipeline.OutLinkFilterFunc$$anonfun$apply$5.apply(Crawler.scala:204)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
    at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:462)
    at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
    at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30)
    at edu.usc.irds.sparkler.service.SolrProxy.addResources(SolrProxy.scala:44)
    at edu.usc.irds.sparkler.solr.SolrUpsert.apply(SolrUpsert.scala:43)
    at edu.usc.irds.sparkler.solr.SolrUpsert.apply(SolrUpsert.scala:34)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
    at org.apache.spark.scheduler.Task.run(Task.scala:89)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.IllegalStateException: Stream handler unavailable due to: For input string: "0x6"
    at org.apache.felix.framework.URLHandlersStreamHandlerProxy.parseURL(URLHandlersStreamHandlerProxy.java:429)
    at java.net.URL.<init>(URL.java:622)
    ... 18 more