askhatri / livycluster

Apache License 2.0
1 stars 0 forks source link

Don't show spark UI and Application ID when using Spark 3.5.0 #2

Open duc-dn opened 1 month ago

duc-dn commented 1 month ago

Hi @askhatri, I built a snapshot master branch of LIVY with spark 3.5.0 After I replace them in helm chart, it worked well. However, when I checked LIVY UI, it doesn't show spark ui and application id. I checked endpoint get /sessions, seem to not get info of spark session

{
    "id": 1,
    "name": null,
    "appId": null,
    "owner": null,
    "proxyUser": null,
    "state": "idle",
    "kind": "pyspark",
    "appInfo": {
        "driverLogUrl": null,
        "sparkUiUrl": null
    },
    "log": [
        "24/07/12 08:39:50 INFO LoggingPodStatusWatcherImpl: Application status for spark-c421e5052f844570bc681a897692dd91 (phase: Running)",
        "24/07/12 08:39:51 INFO LoggingPodStatusWatcherImpl: Application status for spark-c421e5052f844570bc681a897692dd91 (phase: Running)",
        "24/07/12 08:39:52 INFO LoggingPodStatusWatcherImpl: Application status for spark-c421e5052f844570bc681a897692dd91 (phase: Running)",
        "24/07/12 08:39:53 INFO LoggingPodStatusWatcherImpl: Application status for spark-c421e5052f844570bc681a897692dd91 (phase: Running)",
        "24/07/12 08:39:54 INFO LoggingPodStatusWatcherImpl: Application status for spark-c421e5052f844570bc681a897692dd91 (phase: Running)",
        "24/07/12 08:39:55 INFO LoggingPodStatusWatcherImpl: Application status for spark-c421e5052f844570bc681a897692dd91 (phase: Running)",
        "24/07/12 08:39:56 INFO LoggingPodStatusWatcherImpl: Application status for spark-c421e5052f844570bc681a897692dd91 (phase: Running)",
        "24/07/12 08:39:57 INFO LoggingPodStatusWatcherImpl: Application status for spark-c421e5052f844570bc681a897692dd91 (phase: Running)",
        "24/07/12 08:39:58 INFO LoggingPodStatusWatcherImpl: Application status for spark-c421e5052f844570bc681a897692dd91 (phase: Running)",
        "24/07/12 08:39:59 INFO LoggingPodStatusWatcherImpl: Application status for spark-c421e5052f844570bc681a897692dd91 (phase: Running)"
    ],
    "ttl": null,
    "idleTimeout": null,
    "driverMemory": null,
    "driverCores": 0,
    "executorMemory": null,
    "executorCores": 0,
    "archives": [],
    "files": [],
    "heartbeatTimeoutInSecond": 0,
    "jars": [],
    "numExecutors": 0,
    "pyFiles": [],
    "queue": null
}

I forward port of spark driver and can see spark UI of session. In addition, when deploying with your image, I still see sparkUI and applicationID (build image following in Docker.md) So Is this due to Livy Server (currently) not being compatible with Spark 3.5.0? Thanks

askhatri commented 1 month ago

Hi @duc-dn , Which version of JRE/JDK are you using? Livy Server is currently not compatible with JDK 17. Spark 3.5.0 Docker images are built using JDK 17, and Livy is not compatible with JDK 17.

duc-dn commented 1 month ago

Below image is the version that I built snapshot LIVY image

I only change versions of SPARK and HADOOP in pom file And spark image is: apache/spark:3.5.0-scala2.12-java11-python3-r-ubuntu I find that they are the same java version

askhatri commented 1 month ago

Okay, I will test with Spark 3.5.0 and share my findings.

duc-dn commented 1 month ago

Hi again. Do you have any updates about above problem? Besides, I have a curious problem. I find docs in Livy

By default Livy is built against Apache Spark 2.4.5, but the version of Spark used when running Livy does not need to match the version used to build Livy. 
Livy internally handles the differences between different Spark versions.
The Livy package itself does not contain a Spark distribution. It will work with any supported version of Spark without needing to rebuild.

So follow you, when building Livy with spark 3.2.3, can it run with Spark 3.5.0? I tried running basic command pyspark with Livy image: asifkhatri/livy:spark3.2.3, spark: apache/spark:3.5.0-scala2.12-java11-python3-r-ubuntu. It works well and I can see spark ui link in LIVY UI.

askhatri commented 1 month ago

Hi @duc-dn , I am still working on it. Livy image: asifkhatri/livy:spark3.2.3 with spark: apache/spark:3.5.0-scala2.12-java11-python3-r-ubuntu might work. We need to build a new Livy image with spark 3.5.0 for spark: apache/spark:3.5.0-scala2.12-java11-python3-r-ubuntu.

askhatri commented 1 month ago

I have updated the Helm chart to support Spark 3.5.0.

You can use this repository for testing.

To build your own Docker image, refer to this commit in the incubator-livy repository.

Use the following command to build the Livy code:

mvn -e -Pthriftserver -Pscala-2.12 -Pspark3 -Phadoop3 -pl -:livy-python-api clean install

For the Docker image, use the following Dockerfile:

FROM apache/spark:3.5.0-scala2.12-java11-python3-r-ubuntu

ENV LIVY_VERSION=0.9.0-incubating-SNAPSHOT
ENV LIVY_PACKAGE=apache-livy-${LIVY_VERSION}_2.12-bin
ENV LIVY_HOME=/opt/livy
ENV LIVY_CONF_DIR=/conf
ENV PATH=$PATH:$LIVY_HOME/bin

USER root

COPY $LIVY_PACKAGE.zip /

RUN apt-get update && apt-get install -y unzip && \
    unzip /$LIVY_PACKAGE.zip -d / && \
    mv /$LIVY_PACKAGE /opt/ && \
    rm -rf $LIVY_HOME && \
    ln -s /opt/$LIVY_PACKAGE $LIVY_HOME && \
    rm -f /$LIVY_PACKAGE.zip

RUN mkdir /var/log/livy && \
    ln -s /var/log/livy $LIVY_HOME/logs

WORKDIR $LIVY_HOME

ENTRYPOINT ["livy-server"]
duc-dn commented 1 month ago

Hi @askhatri, I cloned this branch: https://github.com/askhatri/incubator-livy/commit/980a78db4ccae6e44afa960762b266e65ac38c4a and built in local using command: mvn -e -Pthriftserver -Pscala-2.12 -Pspark3 -Phadoop3 -pl -:livy-python-api clean install However, I am facing errors:

[ERROR] Tests run: 19, Failures: 0, Errors: 2, Skipped: 0, Time elapsed: 29.218 s <<< FAILURE! - in org.apache.livy.rsc.TestSparkClient
[ERROR] testSparkSQLJob(org.apache.livy.rsc.TestSparkClient)  Time elapsed: 5.263 s  <<< ERROR!
java.util.concurrent.ExecutionException: 
java.lang.RuntimeException: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: Lost task 0.0 in stage 0.0 (TID 0) (192.168.1.13 executor driver): java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/StreamReadConstraints
        at org.apache.spark.sql.catalyst.json.JSONOptions.buildJsonFactory(JSONOptions.scala:194)
        at org.apache.spark.sql.catalyst.json.JsonInferSchema.$anonfun$infer$1(JsonInferSchema.scala:83)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2(RDD.scala:855)
        at org.apache.spark.rdd.RDD.$anonfun$mapPartitions$2$adapted(RDD.scala:855)
        at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
        at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
        at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
askhatri commented 1 month ago

Hi @duc-dn , It appears that there are some test failures. I will work on fixing them. In the meantime, you can use the -DskipITs -DskipTests flag to skip the tests as a workaround.

askhatri commented 1 month ago

Hi @duc-dn, I have tested the build and found that there is a build error. You can check the detailed logs at the following link: log link.

duc-dn commented 1 month ago

@askhatri, which version mvn and java do you use to build LIVY? I'm using mvn 3.6.3, java 11.0.23. If not skipping testing, when testing with spark client, I faced errors in the log file org.apache.livy.rsc.TestSparkClient-output.txt

askhatri commented 1 month ago

I am using "Apache Maven 3.3.9" and "Java version: 1.8.0_292". Same configuration that is used at https://github.com/apache/incubator-livy/actions/runs/9871334539/job/27259040026.

duc-dn commented 1 month ago

thanks @askhatri, let me check

duc-dn commented 1 month ago

@askhatri, I changed mvn 3.3.9 java1.8.0 but I faced same above error

java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/StreamReadConstraints

So is this problem due to the version of jackson?? I don't know why you don't face this problem I tried changing jackson version to 2.15.2 following on StackOverflow log.txt

askhatri commented 1 month ago

I'm not sure about the Jackson version. This is new to me.

duc-dn commented 1 month ago

Hi @askhatri, I found your latest update. Thanks I still have the problem with auth livy from spark magic using LDAP. I added additional config in livy.conf (configmap)

livy.server.auth.type=ldap
livy.server.auth.ldap.url=ldap://localhost:389
livy.server.auth.ldap.base-dn = ou=livy,dc=example,dc=org
livy.server.auth.ldap.username-domain = example.org
livy.server.auth.ldap.enable-start-tls=false
livy.server.auth.ldap.security-authentication=simple

In log of livy server. I don't see raise any error about LDAP, but livy server is restarted continuously

24/07/26 09:52:30 INFO LivyServer: LDAP auth enabled.
24/07/26 09:52:30 INFO WebServer: Starting server on http://livytest-livycluster-0.livytest-livycluster-headless.livy.svc.cluster.local:8998
24/07/26 09:52:31 DEBUG IOStatisticsContextIntegration: Created instance IOStatisticsContextImpl{id=3, threadId=28, ioStatistics=counters=();
gauges=();
minimums=();
maximums=();
means=();
}
24/07/26 09:52:35 DEBUG AuthenticationFilter: Got token null from httpRequest http://10.1.8.205:8998/version
24/07/26 09:52:35 DEBUG AuthenticationFilter: Request [http://10.1.8.205:8998/version] triggering authentication. handler: class org.apache.livy.server.auth.LdapAuthenticationHandlerImpl
24/07/26 09:52:38 DEBUG AuthenticationFilter: Got token null from httpRequest http://10.1.8.205:8998/version
24/07/26 09:52:38 DEBUG AuthenticationFilter: Request [http://10.1.8.205:8998/version] triggering authentication. handler: class org.apache.livy.server.auth.LdapAuthenticationHandlerImpl
24/07/26 09:52:45 DEBUG AuthenticationFilter: Got token null from httpRequest http://10.1.8.205:8998/version
24/07/26 09:52:45 DEBUG AuthenticationFilter: Request [http://10.1.8.205:8998/version] triggering authentication. handler: class org.apache.livy.server.auth.LdapAuthenticationHandlerImpl
24/07/26 09:52:46 DEBUG SparkKubernetesApp: getApplicationReport, applicationId: spark-0f59c651679748de9444a35a32fd1cd7, namespace: livy applicationTag: livy-batch-2-82hd2jb1
24/07/26 09:52:46 DEBUG DefaultSharedIndexInformer: Ready to run resync and reflector for v1/namespaces/livy/pods with resync 0
24/07/26 09:52:46 DEBUG DefaultSharedIndexInformer: Resync skipped due to 0 full resync period for v1/namespaces/livy/pods
24/07/26 09:52:46 DEBUG Reflector: Listing items (1) for v1/namespaces/livy/pods at v762663
24/07/26 09:52:46 DEBUG BatchSession: BatchSession 2 state changed from STARTING to FINISHED
24/07/26 09:52:46 DEBUG SparkKubernetesApp: spark-0f59c651679748de9444a35a32fd1cd7 FINISHED testbatch-a397cf90ee3f8789-driver.livy:     node: docker-desktop    hostname: null  podIp: null     startTime: 2024-07-26T08:54:20Z     phase: Succeeded    reason: null    message: null   labels: name=driver, spark-app-name=testbatch, spark-version=3.5.0, spark-role=driver, spark-app-selector=spark-0f59c651679748de9444a35a32fd1cd7, spark-app-tag=livy-batch-2-82hd2jb1   containers:         spark-kubernetes-driver:            image: apache/spark:3.5.0-scala2.12-java11-python3-r-ubuntu             requests: cpu=1, memory=1408Mi          limits: memory=1408Mi           command: [] [driver, --properties-file, /opt/spark/conf/spark.properties, --class, org.apache.spark.examples.SparkPi, local:///opt/spark/examples/jars/spark-examples_2.12-3.5.0.jar, 10000]    conditions:         PodCondition(lastProbeTime=null, lastTransitionTime=2024-07-26T08:55:55Z, message=null, reason=null, status=False, type=PodReadyToStartContainers, additionalProperties={})         PodCondition(lastProbeTime=null, lastTransitionTime=2024-07-26T08:54:20Z, message=null, reason=PodCompleted, status=True, type=Initialized, additionalProperties={})        PodCondition(lastProbeTime=null, lastTransitionTime=2024-07-26T08:55:54Z, message=null, reason=PodCompleted, status=False, type=Ready, additionalProperties={})         PodCondition(lastProbeTime=null, lastTransitionTime=2024-07-26T08:55:54Z, message=null, reason=PodCompleted, status=False, type=ContainersReady, additionalProperties={})       PodCondition(lastProbeTime=null, lastTransitionTime=2024-07-26T08:54:20Z, message=null, reason=null, status=True, type=PodScheduled, additionalProperties={})
24/07/26 09:52:48 DEBUG AuthenticationFilter: Got token null from httpRequest http://10.1.8.205:8998/version
24/07/26 09:52:48 DEBUG AuthenticationFilter: Request [http://10.1.8.205:8998/version] triggering authentication. handler: class org.apache.livy.server.auth.LdapAuthenticationHandlerImpl
24/07/26 09:52:55 DEBUG AuthenticationFilter: Got token null from httpRequest http://10.1.8.205:8998/version
24/07/26 09:52:55 DEBUG AuthenticationFilter: Request [http://10.1.8.205:8998/version] triggering authentication. handler: class org.apache.livy.server.auth.LdapAuthenticationHandlerImpl
24/07/26 09:52:55 DEBUG AuthenticationFilter: Got token null from httpRequest http://10.1.8.205:8998/version
24/07/26 09:52:55 DEBUG AuthenticationFilter: Request [http://10.1.8.205:8998/version] triggering authentication. handler: class org.apache.livy.server.auth.LdapAuthenticationHandlerImpl
24/07/26 09:52:55 INFO LivyServer: Shutting down Livy server.
24/07/26 09:52:55 DEBUG FileSystem: FileSystem.close() by method: org.apache.hadoop.fs.FilterFileSystem.close(FilterFileSystem.java:529)); Key: (root (auth:SIMPLE))@file://; URI: file:///; Object Identity Hash: e82c2ff
24/07/26 09:52:55 DEBUG FileSystem: FileSystem.close() by method: org.apache.hadoop.fs.RawLocalFileSystem.close(RawLocalFileSystem.java:895)); Key: null; URI: file:///; Object Identity Hash: 453d5f
24/07/26 09:52:55 DEBUG ShutdownHookManager: Completed shutdown in 0.004 seconds; Timeouts: 0
24/07/26 09:52:55 DEBUG ShutdownHookManager: ShutdownHookManager completed shutdown.
duc-dn commented 1 month ago

This is status of Pod. Can you please view this issue help me? image

askhatri commented 1 month ago

I haven't integrated LDAP with Livy yet, but I'll attempt to do so and validate it. We'll need a running LDAP server and will configure Livy to use it with livy.server.auth.ldap.url=ldap://<LDAP_SERVER_HOST>:<LDAP_PORT>. Since the LDAP server might not be on the same pod as Livy, we can't use localhost. We'll need to replace with the actual LDAP server's hostname or IP address.

duc-dn commented 1 month ago

Sorry I copied it wrongly as localhost. I deployed LDAP in same namespace with Livy Server After, I executed livy pod and telnet to ldap and make sure it can communicate with ldap

askhatri commented 1 month ago

ok @duc-dn , I will also try the same from my end.

duc-dn commented 4 weeks ago

Hi again @askhatri. Do you have any updates about LDAP? Additionally, I am concerned about scale livy server therefore I tried to scale up 2 pod livy server but faced an error when starting a spark session. So can it scale up for livy server??

askhatri commented 4 weeks ago

Hi @duc-dn , I’m still working on the LDAP integration and will update you as soon as I have any progress to share. Currently, the Livy server doesn’t support scaled instances, so we’ll need to implement this high availability (HA) feature for Livy.