exacaster / lighter

REST API for Apache Spark on K8S or YARN
MIT License
91 stars 21 forks source link

Lighter session timeout configuration. #1042

Closed mshkalim closed 3 months ago

mshkalim commented 4 months ago

Hi, We are using Z2JH with sparkmagic kernel and Lighter (0.1.1) to create Spark sessions. I faced an issue when I configured Lighter to kill the session after X time automatically when the session state was idle (not during data processing). I found two environment variables that need to be taken care of, but the behavior was not as expected and I think the docs have some missing info or maybe there is a bug in Lighter.

  1. LIGHTER_SESSION_TIMEOUT_INTERVAL - according to the docs, this configuration represents the session lifetime from the last statement creation. I configured it to 2m for testing purposes and found it kills the session longer than 2 minutes from the last statement has finished the processing (~ 5 minutes). so I'm not sure what to expect here and how to measure it.

  2. LIGHTER_SESSION_TIMEOUT_ACTIVE - as I understand, this configuration stands for if a statement is waiting (like pending?) in a queue while another statement is in process. when I configured the timeout interval to 2m (again, for testing purposes), Lighter killed the session no matter if the timeout active was "true" or "false".

can someone please explain some more about these configurations? like:

Thank you

pdambrauskas commented 4 months ago

Hey, yes, these configurations are related

LIGHTER_SESSION_TIMEOUT_INTERVAL is used for killing "forgotten" sessions, lighter check when the last statement was created and if it was created more than configured amount of time ago - lighter kills it.

LIGHTER_SESSION_TIMEOUT_ACTIVE - prevents killing sessions if there are some uncompleted statements.

LIGHTER_SESSION_TIMEOUT_ACTIVE will make no effect if LIGHTER_SESSION_TIMEOUT_INTERVAL is set to zero.

Regarding your 2m configuration - Lighter executes process, that kills timed-out sessions every 10mins. Thats why your session got killed later than configured.

mshkalim commented 4 months ago

Hi, Thank you for your reply.

So if I understand how things work now, the LIGHTER_SESSION_TIMEOUT_INTERVAL will kill 'forgotten' sessions when their lifetime exceeds the timeout interval we configured. there is another process that checks if sessions have reached the timeout interval runs every 10m that's why it did not take effect after 2m. In addition, LIGHTER_SESSION_TIMEOUT_ACTIVE changes the behavior of the session killer process when setting it to true; then it will not kill the session if it has running statements even if it reaches the timeout interval.

If so, I can tell you that Lighter killed my session when I had one statement that still didn't finish the process (spark job) even LIGHTER_SESSION_TIMEOUT_ACTIVE set to true or false and LIGHTER_SESSION_TIMEOUT_INTERVAL set to 2m.

What can be wrong there?

BTW 1, I would like to know your meaning of 'forgotten' sessions, are they forgotten by us? or by lighter? BTW 2, I didn't find anything about the process that runs every 10m in the docs, I think that it is better to mention it.

pdambrauskas commented 4 months ago

If so, I can tell you that Lighter killed my session when I had one statement that still didn't finish the process (spark job) even LIGHTER_SESSION_TIMEOUT_ACTIVE set to true or false and LIGHTER_SESSION_TIMEOUT_INTERVAL set to 2m.

I've double-checked the code, it works as follows:

Do you suspect it works differently for you? Can you see Killing because of timeout log line in the lighter logs?

BTW 1, I would like to know your meaning of 'forgotten' sessions, are they forgotten by us? or by lighter?

I mean forgotten by the user. Lighter should not forget about your sessions.

BTW 2, I didn't find anything about the process that runs every 10m in the docs, I think that it is better to mention it.

Yes, we'll update it.

mshkalim commented 3 months ago

After setting the LIGHTER_SESSION_TIMEOUT_INTERVAL with value greater then 10m and LIGHTER_SESSION_TIMEOUT_ACTIVE to false, I got it work.

but just to let you know, when setting LIGHTER_SESSION_TIMEOUT_INTERVAL value lesser then 10m the it not work's as expected

thank you :)