aws-samples / aws-glue-samples

AWS Glue code samples
MIT No Attribution
1.42k stars 812 forks source link

Add JAVA_HOME to Spark UI docker image #149

Closed junoha closed 11 months ago

junoha commented 11 months ago

Issue #, if available:

NoClassDefFoundError error occurs when checking streaming job with the latest docker image. It seems that Spark UI container uses Java 21 not Java 8.

2023-10-04 15:14:18 WARN server.HttpChannel: /history/spark-application-1696388583270/jobs/
org.sparkproject.guava.util.concurrent.ExecutionError: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.catalyst.util.DateTimeUtils$
    at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2261)
    at org.sparkproject.guava.cache.LocalCache.get(LocalCache.java:4000)
    at org.sparkproject.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
    at org.sparkproject.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
    at org.apache.spark.deploy.history.ApplicationCache.get(ApplicationCache.scala:89)
    at org.apache.spark.deploy.history.ApplicationCache.withSparkUI(ApplicationCache.scala:101)
    at org.apache.spark.deploy.history.HistoryServer.org$apache$spark$deploy$history$HistoryServer$$loadAppUi(HistoryServer.scala:256)
    at org.apache.spark.deploy.history.HistoryServer$$anon$1.doGet(HistoryServer.scala:104)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:503)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
    at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
    at org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1631)
    at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
    at org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
    at org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1601)
    at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:548)
    at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
    at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
    at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
    at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:501)
    at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
    at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
    at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
    at org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:763)
    at org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
    at org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
    at org.sparkproject.jetty.server.Server.handle(Server.java:516)
    at org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
    at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
    at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479)
    at org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
    at org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
    at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
    at org.sparkproject.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
    at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)
    at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)
    at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)
    at org.sparkproject.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)
    at org.sparkproject.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:409)
    at org.sparkproject.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)
    at org.sparkproject.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)
    at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.NoClassDefFoundError: Could not initialize class org.apache.spark.sql.catalyst.util.DateTimeUtils$
    at org.apache.spark.sql.streaming.ui.UIUtils$$anon$1.initialValue(UIUtils.scala:68)
    at org.apache.spark.sql.streaming.ui.UIUtils$$anon$1.initialValue(UIUtils.scala:65)
    at java.base/java.lang.ThreadLocal.setInitialValue(ThreadLocal.java:225)
    at java.base/java.lang.ThreadLocal.get(ThreadLocal.java:194)
    at java.base/java.lang.ThreadLocal.get(ThreadLocal.java:172)
    at org.apache.spark.sql.streaming.ui.UIUtils$.parseProgressTimestamp(UIUtils.scala:74)
    at org.apache.spark.sql.streaming.ui.StreamingQueryStatusListener.onQueryStarted(StreamingQueryStatusListener.scala:74)
    at org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus.doPostEvent(StreamingQueryListenerBus.scala:131)
    at org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus.doPostEvent(StreamingQueryListenerBus.scala:43)
    at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
    at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
    at org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus.postToAll(StreamingQueryListenerBus.scala:88)
    at org.apache.spark.sql.execution.streaming.StreamingQueryListenerBus.onOtherEvent(StreamingQueryListenerBus.scala:108)
    at org.apache.spark.scheduler.SparkListenerBus.doPostEvent(SparkListenerBus.scala:100)
    at org.apache.spark.scheduler.SparkListenerBus.doPostEvent$(SparkListenerBus.scala:28)
    at org.apache.spark.scheduler.ReplayListenerBus.doPostEvent(ReplayListenerBus.scala:35)
    at org.apache.spark.scheduler.ReplayListenerBus.doPostEvent(ReplayListenerBus.scala:35)
    at org.apache.spark.util.ListenerBus.postToAll(ListenerBus.scala:117)
    at org.apache.spark.util.ListenerBus.postToAll$(ListenerBus.scala:101)
    at org.apache.spark.scheduler.ReplayListenerBus.postToAll(ReplayListenerBus.scala:35)
    at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:89)
    at org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:60)
    at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$3(FsHistoryProvider.scala:1145)
    at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$3$adapted(FsHistoryProvider.scala:1143)
    at org.apache.spark.util.Utils$.tryWithResource(Utils.scala:2764)
    at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1(FsHistoryProvider.scala:1143)
    at org.apache.spark.deploy.history.FsHistoryProvider.$anonfun$parseAppEventLogs$1$adapted(FsHistoryProvider.scala:1141)
    at scala.collection.immutable.List.foreach(List.scala:431)
    at org.apache.spark.deploy.history.FsHistoryProvider.parseAppEventLogs(FsHistoryProvider.scala:1141)
    at org.apache.spark.deploy.history.FsHistoryProvider.rebuildAppStore(FsHistoryProvider.scala:1122)
    at org.apache.spark.deploy.history.FsHistoryProvider.createInMemoryStore(FsHistoryProvider.scala:1360)
    at org.apache.spark.deploy.history.FsHistoryProvider.getAppUI(FsHistoryProvider.scala:378)
    at org.apache.spark.deploy.history.HistoryServer.getAppUI(HistoryServer.scala:199)
    at org.apache.spark.deploy.history.ApplicationCache.$anonfun$loadApplicationEntry$2(ApplicationCache.scala:164)
    at org.apache.spark.deploy.history.ApplicationCache.time(ApplicationCache.scala:135)
    at org.apache.spark.deploy.history.ApplicationCache.org$apache$spark$deploy$history$ApplicationCache$$loadApplicationEntry(ApplicationCache.scala:162)
    at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:56)
    at org.apache.spark.deploy.history.ApplicationCache$$anon$1.load(ApplicationCache.scala:52)
    at org.sparkproject.guava.cache.LocalCache$LoadingValueReference.loadFuture(LocalCache.java:3599)
    at org.sparkproject.guava.cache.LocalCache$Segment.loadSync(LocalCache.java:2379)
    at org.sparkproject.guava.cache.LocalCache$Segment.lockedGetOrLoad(LocalCache.java:2342)
    at org.sparkproject.guava.cache.LocalCache$Segment.get(LocalCache.java:2257)
    ... 41 more
Caused by: java.lang.ExceptionInInitializerError: Exception java.lang.ExceptionInInitializerError [in thread "HistoryServerUI-58"]
    at org.apache.spark.unsafe.types.UTF8String.fromBytes(UTF8String.java:109)
    at org.apache.spark.unsafe.types.UTF8String.fromString(UTF8String.java:139)
    at org.apache.spark.unsafe.types.UTF8String.<clinit>(UTF8String.java:99)
    at org.apache.spark.sql.catalyst.util.DateTimeUtils$.<init>(DateTimeUtils.scala:243)
    at org.apache.spark.sql.catalyst.util.DateTimeUtils$.<clinit>(DateTimeUtils.scala)
    ... 83 more

Description of changes:

Add JAVA_HOME environment variables to Dockerfile so that Spark UI container uses Java 8 runtime.

# Inside of Spark UI container

## Check java_sdk alternatives
bash-5.2# ls -l /etc/alternatives/ | grep "java_sdk"
lrwxrwxrwx 1 root root 46 Sep 22 07:36 java_sdk -> /usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64
lrwxrwxrwx 1 root root 46 Sep 22 07:36 java_sdk_1.8.0 -> /usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64
lrwxrwxrwx 1 root root 46 Sep 22 07:36 java_sdk_1.8.0_openjdk -> /usr/lib/jvm/java-1.8.0-amazon-corretto.x86_64

## java_sdk_1.8.0_openjdk points to amazon corretto
bash-5.2# ls -l /usr/lib/jvm/
total 8
lrwxrwxrwx 1 root root   26 Sep 22 07:36 java -> /etc/alternatives/java_sdk
lrwxrwxrwx 1 root root   32 Sep 22 07:36 java-1.8.0 -> /etc/alternatives/java_sdk_1.8.0
drwxr-xr-x 9 root root 4096 Sep 22 07:36 java-1.8.0-amazon-corretto.x86_64
lrwxrwxrwx 1 root root   40 Sep 22 07:36 java-1.8.0-openjdk -> /etc/alternatives/java_sdk_1.8.0_openjdk
lrwxrwxrwx 1 root root   41 Sep 22 07:36 java-21-amazon-corretto -> /etc/alternatives/java-21-amazon-corretto
drwxr-xr-x 7 root root 4096 Sep 22 07:36 java-21-amazon-corretto.x86_64
lrwxrwxrwx 1 root root   21 Sep 22 07:36 jre -> /etc/alternatives/jre
lrwxrwxrwx 1 root root   27 Sep 22 07:36 jre-1.8.0 -> /etc/alternatives/jre_1.8.0
lrwxrwxrwx 1 root root   35 Sep 22 07:36 jre-1.8.0-openjdk -> /etc/alternatives/jre_1.8.0_openjdk
lrwxrwxrwx 1 root root   24 Sep 22 07:36 jre-21 -> /etc/alternatives/jre_21
lrwxrwxrwx 1 root root   32 Sep 22 07:36 jre-21-openjdk -> /etc/alternatives/jre_21_openjdk
lrwxrwxrwx 1 root root   29 Sep 22 07:36 jre-openjdk -> /etc/alternatives/jre_openjdk

## spark-class checks JAVA_HOME for Spark runtime
bash-5.2# grep -E "(JAVA_HOME|RUNNER)" /opt/spark/bin/spark-class
if [ -n "${JAVA_HOME}" ]; then
  RUNNER="${JAVA_HOME}/bin/java"
    RUNNER="java"
    echo "JAVA_HOME is not set" >&2
  "$RUNNER" -Xmx128m $SPARK_LAUNCHER_OPTS -cp "$LAUNCH_CLASSPATH" org.apache.spark.launcher.Main "$@"

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

moomindani commented 11 months ago

Thank you for your contribution!