exacaster / lighter

REST API for Apache Spark on K8S or YARN
MIT License
91 stars 21 forks source link

How to configure spark app Id while creating spark interactive session? #900

Closed sumanthssr closed 2 weeks ago

sumanthssr commented 8 months ago

currently session ID is created as be417708-645b-4302-8972-ebc9f6ead47c using below configuration.

{ "id": "session1", "submit-params": { "name": "session1", "numExecutors": 4, "executorCores": 2, "executorMemory": "2G", "driverCores": 2, "driverMemory": "1G", "conf": { "spark.pyspark.python": "/opt/conda/envs//bin/python", "spark.pyspark.driver.python": "/opt/conda/envs//bin/python", "spark.sql.catalogImplementation": "hive", "spark.rpc.message.maxSize": 1024 } } }

Current result : image

Expected result:

spark app ID : should start with session1

sumanthssr commented 8 months ago

tried passing LIGHTER_SESSION_ID as spark.kubernetes.driverEnv.LIGHTER_SESSION_ID but not picking up during lighter session creation.

Minutis commented 7 months ago

Hello, @sumanthssr,

sorry for the late reply. If I understand correctly, you are trying to create a regular interactive session. Unfortunately, it is not possible to specify IDs for the regular sessions. Only permanent sessions can have user specified IDs.

Minutis commented 7 months ago

Maybe you could describe your usecase a little bit more? I'd like to hear what are you trying to achieve.

sumanthssr commented 7 months ago

Hi @Minutis , we are using lighter service for interactive spark sessions using spark magics from jupyterlab on k8s. End users from jupyterlab notebook create sessions and run spark code in notebooks. It is becoming difficult for end users to view spark history server logs from Spark History server, since the sessions are created using random hash as shown below, it will be great if there is any way to have an option to customise spark appid/appname, so that users can pick spark app associated with notebook from spark history server to debug more into user spark code.

image

sumanthssr commented 7 months ago

hi @Minutis tried creating permanent session, still id is random hash, attached screenshot

image image

Minutis commented 7 months ago

Regarding the identifying different logs I think that setting a name for your spark application should solve the issue. Wouldn't you agree? And setting the name of the Spark application is configurable for all type of Spark jobs including regular interactive session.

Edit: Another thing that you should be able to configure is LIGHTER_SPARK_HISTORY_SERVER_URL. This way you would have an option to click on icon directly on the Lighter UI next to a running job and be forwarded to a Spark History server.

And regarding your try to create a permanent session, I do not think that you are able to create it in the approach that you are using. There are two ways to create a permanent session:

  1. Specifying configuration at launch time
  2. Using REST API and sending PUT command

I think that spark magic by default uses POST method and also I am not sure how it even works with the URL overriding that you are trying to do. But I've never used it in a fashion like you show in your screenshots, so I might be somewhat wrong here.

sumanthssr commented 7 months ago

@Minutis regarding setting spark app name, we have tried to configure spark app name for regular interactive session, but it seems session id/spark app name picked from LIGHTER_SESSION_ID environment variable, we tried passing LIGHTER_SESSION_ID via spark magics using spark.kubernetes.driverEnv.LIGHTER_SESSION_ID = 'testname' but it is seems lighter session is not considering this environment variable passed from spark magics to driver pod.

PS refer to line 139 in https://github.com/exacaster/lighter/blob/master/server/src/main/resources/shell_wrapper.py

pdambrauskas commented 7 months ago

you can specify {"submit-params": {"name": "...",...}...} to customise application name. In that case "App name" on Spark history server would be set. You should use name.

sumanthssr commented 7 months ago

@pdambrauskas @Minutis please refer to below snapshot, we did try above way, but spark app name is still random hash ? are we missing anything here?

image

snapshot from spark history server

image

pdambrauskas commented 7 months ago

It is strange, when looking at the code, It seems, that name setting should work, can you maybe try this magic:

%%configure -f
{"name": "test", "conf": {"spark.sql.catalogImplementation": "hive"}}

Sadly, I do not have yarn environment, where I could test it myself :(

sumanthssr commented 7 months ago

same result, not picking the value.

-sumanth

On Fri, 1 Mar 2024 at 12:00 AM, Paulius @.***> wrote:

It is strange, when looking at the code, It seems, that name setting should work, can you maybe try this magic:

%%configure -f {"name": "test", "conf": {"spark.sql.catalogImplementation": "hive"}}

Sadly, I do not have yarn environment, where I could test it myself :(

— Reply to this email directly, view it on GitHub https://github.com/exacaster/lighter/issues/900#issuecomment-1971718080, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHHW5APAK6ISHC27DPRCSFDYV5ZVLAVCNFSM6AAAAABDY7QZFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZRG4YTQMBYGA . You are receiving this because you were mentioned.Message ID: @.***>

pdambrauskas commented 7 months ago

We’ll have to wait for @Minutis then, since he is the one who has the environment for testing and debugging.

sumanthssr commented 7 months ago

@Minutis can you help us here? we are not able to set spark app name.

Minutis commented 7 months ago

@sumanthssr again sorry for the late reply. I am partially unavailable this week. I just did tests internally and name settings works fine.

I am unfamiliar with the way you are interacting with Lighter therefore I do not fully understand the inner workings. Could you please provide a link to documentation regarding spark add -l python -u <url> and %%spark config commands?

Also could you please provide the screenshot of Lighter UI in Sessions tab? Here is mine when I set application name: image image