USCDataScience / sparkler

Spark-Crawler: Apache Nutch-like crawler that runs on Apache Spark.
http://irds.usc.edu/sparkler/
Apache License 2.0
411 stars 143 forks source link

sparkDriver Exception while crawling #85

Closed adityardesai closed 7 years ago

adityardesai commented 7 years ago

Hi As soon as I hit the command to crawl, I see the exception which will result in stop of the execution of the code

aditya@aditya-Inspiron-5520:~/gitRepos/sparkler.git$ bin/sparkler.sh crawl -id sjob-1487441495596 -m local[*] -i 1 2017-02-18 10:14:03 WARN NativeCodeLoader:62 [main] - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 WARN Utils:70 [main] - Service 'sparkDriver' could not bind on port 0. Attempting port 1. 2017-02-18 10:14:03 ERROR SparkContext:95 [main] - Error initializing SparkContext. java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries! at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Exception in thread "main" java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at edu.usc.irds.sparkler.Main$.main(Main.scala:47) at edu.usc.irds.sparkler.Main.main(Main.scala) Caused by: java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries! at sun.nio.ch.Net.bind0(Native Method) at sun.nio.ch.Net.bind(Net.java:433) at sun.nio.ch.Net.bind(Net.java:425) at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:125) at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:485) at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1089) at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:430) at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:415) at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:903) at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:198) at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:348) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745)

thammegowda commented 7 years ago

Caused by: java.net.BindException: Cannot assign requested address: Service 'sparkDriver' failed after 16 retries!

This seems a temporary issue in your environment. Looks like the spark driver is unable to get a port it wanted. May be there are some security constraints or maybe there is some other process using it. Just check if you have any other processing using port 4040. I usually do lsof -i :4040 and kill that process if it is not required.

If there isn't any other process, then I don't see a reason why it was unable to get that port. Maybe you have older JDK? Is your JDK version is 1.8?

If this problem remains in your environment, I recommend trying bin/dockler.sh, that will give you ubuntu container.

adityardesai commented 7 years ago

I a, trying on Ubuntu and also on Google Cloud. Issue is same. No other process is running on 4040. I use JDK 1.8

adityardesai commented 7 years ago

Solved this issue. Was related to local environment.