Closed sumanthssr closed 2 weeks ago
tried passing LIGHTER_SESSION_ID as spark.kubernetes.driverEnv.LIGHTER_SESSION_ID but not picking up during lighter session creation.
Hello, @sumanthssr,
sorry for the late reply. If I understand correctly, you are trying to create a regular interactive session. Unfortunately, it is not possible to specify IDs for the regular sessions. Only permanent sessions can have user specified IDs.
Maybe you could describe your usecase a little bit more? I'd like to hear what are you trying to achieve.
Hi @Minutis , we are using lighter service for interactive spark sessions using spark magics from jupyterlab on k8s. End users from jupyterlab notebook create sessions and run spark code in notebooks. It is becoming difficult for end users to view spark history server logs from Spark History server, since the sessions are created using random hash as shown below, it will be great if there is any way to have an option to customise spark appid/appname, so that users can pick spark app associated with notebook from spark history server to debug more into user spark code.
hi @Minutis tried creating permanent session, still id is random hash, attached screenshot
Regarding the identifying different logs I think that setting a name for your spark application should solve the issue. Wouldn't you agree? And setting the name of the Spark application is configurable for all type of Spark jobs including regular interactive session.
Edit: Another thing that you should be able to configure is LIGHTER_SPARK_HISTORY_SERVER_URL
. This way you would have an option to click on icon directly on the Lighter UI next to a running job and be forwarded to a Spark History server.
And regarding your try to create a permanent session, I do not think that you are able to create it in the approach that you are using. There are two ways to create a permanent session:
I think that spark magic by default uses POST method and also I am not sure how it even works with the URL overriding that you are trying to do. But I've never used it in a fashion like you show in your screenshots, so I might be somewhat wrong here.
@Minutis regarding setting spark app name, we have tried to configure spark app name for regular interactive session, but it seems session id/spark app name picked from LIGHTER_SESSION_ID environment variable, we tried passing LIGHTER_SESSION_ID via spark magics using spark.kubernetes.driverEnv.LIGHTER_SESSION_ID = 'testname' but it is seems lighter session is not considering this environment variable passed from spark magics to driver pod.
PS refer to line 139 in https://github.com/exacaster/lighter/blob/master/server/src/main/resources/shell_wrapper.py
you can specify {"submit-params": {"name": "...",...}...}
to customise application name. In that case "App name" on Spark history server would be set. You should use name.
@pdambrauskas @Minutis please refer to below snapshot, we did try above way, but spark app name is still random hash ? are we missing anything here?
snapshot from spark history server
It is strange, when looking at the code, It seems, that name setting should work, can you maybe try this magic:
%%configure -f
{"name": "test", "conf": {"spark.sql.catalogImplementation": "hive"}}
Sadly, I do not have yarn environment, where I could test it myself :(
same result, not picking the value.
-sumanth
On Fri, 1 Mar 2024 at 12:00 AM, Paulius @.***> wrote:
It is strange, when looking at the code, It seems, that name setting should work, can you maybe try this magic:
%%configure -f {"name": "test", "conf": {"spark.sql.catalogImplementation": "hive"}}
Sadly, I do not have yarn environment, where I could test it myself :(
— Reply to this email directly, view it on GitHub https://github.com/exacaster/lighter/issues/900#issuecomment-1971718080, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHHW5APAK6ISHC27DPRCSFDYV5ZVLAVCNFSM6AAAAABDY7QZFSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZRG4YTQMBYGA . You are receiving this because you were mentioned.Message ID: @.***>
We’ll have to wait for @Minutis then, since he is the one who has the environment for testing and debugging.
@Minutis can you help us here? we are not able to set spark app name.
@sumanthssr again sorry for the late reply. I am partially unavailable this week. I just did tests internally and name settings works fine.
I am unfamiliar with the way you are interacting with Lighter therefore I do not fully understand the inner workings. Could you please provide a link to documentation regarding spark add -l python -u <url>
and %%spark config
commands?
Also could you please provide the screenshot of Lighter UI in Sessions
tab? Here is mine when I set application name:
currently session ID is created as be417708-645b-4302-8972-ebc9f6ead47c using below configuration.
{ "id": "session1", "submit-params": { "name": "session1", "numExecutors": 4, "executorCores": 2, "executorMemory": "2G", "driverCores": 2, "driverMemory": "1G", "conf": { "spark.pyspark.python": "/opt/conda/envs//bin/python",
"spark.pyspark.driver.python": "/opt/conda/envs//bin/python",
"spark.sql.catalogImplementation": "hive",
"spark.rpc.message.maxSize": 1024
}
}
}
Current result :
Expected result:
spark app ID : should start with session1