aws-samples / sagemaker-studio-lifecycle-config-examples

MIT No Attribution
81 stars 53 forks source link

install-autoshutdown-server-extension/on-jupyter-server-start.sh failing #17

Closed lkev closed 1 year ago

lkev commented 2 years ago

I've created a lifecycleconfig at the domain level. The script is install-autoshutdown-server-extension)/on-jupyter-server-start.sh.

However, the script fails immediately when trying to cd into /home/sagemaker-user/:

+ export TIMEOUT_IN_MINS=120
--
+ TIMEOUT_IN_MINS=120
+ cd /home/sagemaker-user
/opt/ml/lifecycleconfig/lifecycle_script.sh: line 17: cd: /home/sagemaker-user: No such file or directory
+ export TIMEOUT_IN_MINS=120
+ TIMEOUT_IN_MINS=120
+ cd /home/sagemaker-user
/opt/ml/lifecycleconfig/lifecycle_script.sh: line 17: cd: /home/sagemaker-user: No such file or directory

Is there something I'm missing? I'm on jupyterlab v1, and the following is what I see when starting up:

image
durgasury commented 2 years ago

Hi, looks like you're trying to run it as a KernelGateway LCC script. This script supports only JupyterServer (will poll for kernel sessions and delete KernelGateway apps as they become idle). For the Data Science app, the home directory is /root, so it will fail trying to cd into /home/sagemaker-user. Please refer to the Configure auto-shutdown of inactive kernels section of this blog for setting up a JupyterServer LCC script.

JMarsz-EA commented 2 years ago

I have created the lifecycle script as a JupyterServer type, and I'm running into the same issue when I try to start a image terminal. I have the lifecycle script applied to my user profile only.

nate-benton90 commented 2 years ago

My KernelGateway app still persist despite having configured everything accordingly. The difference between my various tests is that more and more terminals get provisioned in the Studio notebook. Also, the logs explained here show no indication of errors/failures.

GavinatorK commented 2 years ago

This doesn't work as kernel gateway app, i'm not sure i understand what on jupyterserver start means, i have to have the studio running to attach this lifecycle configuration, i'm able to attach this to the Studio, what i don't understand is when will this be applied, do i have to restart the server ?

durgasury commented 2 years ago

@JMarsz-EA - can you share more details? The Image terminal runs on the KernelGateway app, not the JupyterServer app, so there is some mismatch in the configuration.

@nate-benton90 - are you seeing any logs in your Cloudwatch? See the Troubleshooting section in this blog for the log stream to look for. You should see logs like "comparing idle time.." etc. happen every 10 seconds. Please raise a support case otherwise, and our team will look into it.

@GavinatorK This blog explains SageMaker Studio architecture, along with the different app types. To answer your question, once you attach this LCC, you will need to restart Studio

mrbrandt92 commented 1 year ago

Hello, I have tried all the above: attaching as KernelGateway, as JupyterServer, deleting default and restarting Studio, but nothing works because it fails with /opt/ml/lifecycleconfig/lifecycle_script.sh: line 32: cd: /home/sagemaker-user: No such file or directory

Is there any update on this? @durgasury

durgasury commented 1 year ago

@mrbrandt92 It has to be attached as a JupyterServer LCC. If you're running it in KGW, it will fail with the error you mentioned, since there's no /home/sagemaker-user in the KGW app. What's the error you get when you run it as the JS LCC?

mrbrandt92 commented 1 year ago

that was the same error I got after I ran:

LCC_CONTENT=openssl base64 -A -in SCRIPT_NAME.sh

aws --profile profile_name sagemaker create-studio-lifecycle-config \ --studio-lifecycle-config-name my-config-name \ --studio-lifecycle-config-content $LCC_CONTENT \ --studio-lifecycle-config-app-type JupyterServer

aws --profile profile_name sagemaker update-domain --domain-id d-MY_DOMAIN_ID \ --default-user-settings '{ "JupyterServerAppSettings": { "DefaultResourceSpec": { "LifecycleConfigArn": "arn:aws:sagemaker:us-east-1:ACCOUNT_ID:studio-lifecycle-config/my-config-name", "InstanceType": "system" }, "LifecycleConfigArns": [ "arn:aws:sagemaker:us-east-1:ACCOUNT_ID:studio-lifecycle-config/my-config-name" ] }}'

So it attaches and I can create and/or update a user to attach that lifecycle config, but when I go to create a new notebook and attach that start-up script, it fails. I'm realizing now that I didn't try creating a new notebook without attaching the script and seeing if my custom packages are already available to install, let me try that.

UPDATE: we had custom packages we needed installed on the kernel, but we had both the custom packages and the auto shutdown script in the same script trying to apply to EITHER JupyterServer or KernelGateway. We needed to separate them out in order to work; thank you for the guidance!

durgasury commented 1 year ago

Closing this issue since it's a configuration issue on the user's end. To reiterate:

Make sure the LCC is set for the JupyterServer app, not for the KernelGateway app.

fulcrum29 commented 7 months ago

Hey, When I attach the script as default lifecycle configuration for my domain I get: stdbuf: failed to run command ‘/opt/ml/lifecycleconfig/lifecycle_script.sh’: No such file or directory Any idea how to fix it? It's set to JupyterServer app

StefanHemetsberger commented 7 months ago

Had the same problem. It seems that I have a wrong line ending format in the shell script.

Fixed it with:

Notepad++ Replace \r\n with \n

namsohj commented 4 weeks ago

Had the same problem. It seems that I have a wrong line ending format in the shell script.

Fixed it with:

Notepad++ Replace \r\n with \n

This fix is real!!!!