bitnami / charts

Bitnami Helm Charts
https://bitnami.com
Other
9.03k stars 9.22k forks source link

ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread, Master not initialising on custom Image. #7864

Closed gandharvsuri closed 3 years ago

gandharvsuri commented 3 years ago

Which chart: The name (and version) of the affected chart bitnami/spark

Describe the bug A clear and concise description of what the bug is. Using custom image, master node not initialising.

To Reproduce Steps to reproduce the behavior:

Follwoing #6661 I wrote a custom Dockerfile

FROM bitnami/spark:2.4.5-debian-10-r49

ARG spark_version=2.4.5
ARG hadoop_version=2.7

USER root

RUN apt-get update && \
    apt-get install -y curl  && \
    apt-get install -y wget && \
    apt-get install -y zip unzip

RUN pip3 install elasticsearch
RUN pip3 install jproperties
RUN pip3 install kafka-python
RUN pip3 install pyspark==2.4.5
RUN pip3 install configparser

logs for the pod return me this

09:07:27.48 
 09:07:27.49 Welcome to the Bitnami spark container
 09:07:27.49 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-spark
 09:07:27.49 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-spark/issues
 09:07:27.49 Send us your feedback at containers@bitnami.com
 09:07:27.50 
 09:07:27.51 INFO  ==> ** Starting Spark setup **
 09:07:27.52 INFO  ==> Generating Spark configuration file...
 09:07:27.53 INFO  ==> ** Spark setup finished! **

 09:07:27.54 INFO  ==> ** Starting Spark in master mode **
starting org.apache.spark.deploy.master.Master, logging to /opt/bitnami/spark/logs/spark--org.apache.spark.deploy.master.Master-1-spark-master-0.out
Spark Command: /opt/bitnami/java/bin/java -cp /opt/bitnami/spark/conf/:/opt/bitnami/spark/jars/* -Xmx1g org.apache.spark.deploy.master.Master --host spark-master --port 7077 --webui-port 8080
========================================
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
21/10/20 09:07:28 INFO Master: Started daemon with process name: 38@spark-master-0.spark-headless.reporting.svc.cluster.local
21/10/20 09:07:28 INFO SignalUtils: Registered signal handler for TERM
21/10/20 09:07:29 INFO SignalUtils: Registered signal handler for HUP
21/10/20 09:07:29 INFO SignalUtils: Registered signal handler for INT
21/10/20 09:07:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/10/20 09:07:29 INFO SecurityManager: Changing view acls to: spark
21/10/20 09:07:29 INFO SecurityManager: Changing modify acls to: spark
21/10/20 09:07:29 INFO SecurityManager: Changing view acls groups to: 
21/10/20 09:07:29 INFO SecurityManager: Changing modify acls groups to: 
21/10/20 09:07:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(spark); groups with view permissions: Set(); users  with modify permissions: Set(spark); groups with modify permissions: Set()
21/10/20 09:07:30 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[main,5,main]
java.nio.channels.UnresolvedAddressException
        at sun.nio.ch.Net.checkAddress(Net.java:101)
        at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:215)
        at io.netty.channel.socket.nio.NioServerSocketChannel.doBind(NioServerSocketChannel.java:132)
        at io.netty.channel.AbstractChannel$AbstractUnsafe.bind(AbstractChannel.java:551)
        at io.netty.channel.DefaultChannelPipeline$HeadContext.bind(DefaultChannelPipeline.java:1346)
        at io.netty.channel.AbstractChannelHandlerContext.invokeBind(AbstractChannelHandlerContext.java:503)
        at io.netty.channel.AbstractChannelHandlerContext.bind(AbstractChannelHandlerContext.java:488)
        at io.netty.channel.DefaultChannelPipeline.bind(DefaultChannelPipeline.java:985)
        at io.netty.channel.AbstractChannel.bind(AbstractChannel.java:247)
        at io.netty.bootstrap.AbstractBootstrap$2.run(AbstractBootstrap.java:344)
        at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163)
        at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:510)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:518)
        at io.netty.util.concurrent.SingleThreadEventExecutor$6.run(SingleThreadEventExecutor.java:1044)
        at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
        at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
        at java.lang.Thread.run(Thread.java:748)
21/10/20 09:07:30 INFO ShutdownHookManager: Shutdown hook called

Version of Helm and Kubernetes:

version.BuildInfo{Version:"v3.7.0", GitCommit:"eeac83883cb4014fe60267ec6373570374ce770b", GitTreeState:"clean", GoVersion:"go1.16.8"}
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.2", GitCommit:"8b5a19147530eaac9476b0ab82980b4088bbc1b2", GitTreeState:"clean", BuildDate:"2021-09-15T21:38:50Z", GoVersion:"go1.16.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"19+", GitVersion:"v1.19.13-eks-8df270", GitCommit:"8df2700a72a2598fa3a67c05126fa158fd839620", GitTreeState:"clean", BuildDate:"2021-07-31T01:36:57Z", GoVersion:"go1.15.14", Compiler:"gc", Platform:"linux/amd64"}

Additional context Add any other context about the problem here.

juan131 commented 3 years ago

Hi @gandharvsuri

There is an example in the image's README.md about how to extend the image (please note that it recovers the original user 1001 after applying the modifications) that you can use as a reference:

I am afraid that we (Bitnami maintainers) cannot help you to debug issues with images that have been customized. In any case, let's keep the issue opened for a few days to see if other Bitnami users can help you with your customizations.

gandharvsuri commented 3 years ago

Hi @juan131, Thanks for your response. I would be needing to add external jar files later.

But I believe the issue is not with the docker image. I tried a very basic docker image by commenting out everything and that gave an error too.

Other than that I have made changes in the following section of the values.yaml file only.

image:
  registry: docker.io
  repository: bitnami/spark
  tag: 2.4.5-debian-10-r49
juan131 commented 3 years ago

Hi @gandharvsuri

So you're telling that using the default values (without modifying any single parameter, including the image.* ones) you're also able to reproduce the issue, am I right?

gandharvsuri commented 3 years ago

Hi @juan131

I meant that having a basic dockerfile with no RUN commands.

FROM bitnami/spark:2.4.5-debian-10-r49

I did change the image: parameters to use my docker image.

But surprisingly if I use the bitnami/spark:3-debian-10 as the base image, works completely fine. Any guess why?

juan131 commented 3 years ago

That's very weird @gandharvsuri

Did you upload your image to DockerHub (or similar registry) when you rebuilt the image including the single FROM bitnami/spark:2.4.5-debian-10-r49 layer? If so, did you reuse the same tag you had previously used? If so, did you set the imagePullPolicy (setting the image.pullPolicy parameter) to Always so you ensure the image is always pulled in your k8s cluster instead of reusing the one in the cache?

gandharvsuri commented 3 years ago

Hi @juan131, yes I did push it to DockerHub. I tried using it from both local memory and the docker.io by specifying it in image.registry. I was using the tag latest so I believe it by default took the image.pullPolicy as Always.

juan131 commented 3 years ago

Hi @gandharvsuri

I was using the tag latest so I believe it by default took the image.pullPolicy as Always.

That's not automatically done, you need to manually set it to Always. Otherwise, it'll use the default values which is IfNotPresent, see https://github.com/bitnami/charts/blob/master/bitnami/spark/values.yaml#L66

github-actions[bot] commented 3 years ago

This Issue has been automatically marked as "stale" because it has not had recent activity (for 15 days). It will be closed if no further activity occurs. Thanks for the feedback.

github-actions[bot] commented 3 years ago

Due to the lack of activity in the last 5 days since it was marked as "stale", we proceed to close this Issue. Do not hesitate to reopen it later if necessary.