Closed greyes-trc closed 4 months ago
Thanks for reporting this issue @greyes-trc
We have done some research and it looks like while cron
package is now installed in the SageMaker Distribution 1.6, the cron
service is not running. I have opened an issue on GitHub for this: https://github.com/aws/sagemaker-distribution/issues/354
In the meantime, you can implement the following logic in the LCC to:
cron
package if it is already installedcron
service manually# Check if cron needs to be installed
status="$(dpkg-query -W --showformat='${db:Status-Status}' "cron" 2>&1)"
if [ ! $? = 0 ] || [ ! "$status" = installed ]; then
# Fixing invoke-rc.d: policy-rc.d denied execution of restart.
sudo /bin/bash -c "echo '#!/bin/sh
exit 0' > /usr/sbin/policy-rc.d"
# Installing cron.
echo "Installing cron..."
sudo apt install cron
else
echo "Package cron is already installed."
sudo cron
fi
As an example, for JupyterLab LCC, you would have to replace lines 32-38 with the above code in this file: https://github.com/aws-samples/sagemaker-studio-apps-lifecycle-config-examples/blob/main/jupyterlab/auto-stop-idle/on-start.sh
Thanks a lot! Not sure if I did something wrong, but for me, in version 1.6 this was working great, but the LCC was exiting with non zero status in version 1.5. To fix this, I changed one line in your code to this:
status="$(dpkg-query -W --showformat='$${db:Status-Status}' "cron" 2>&1)" || status="not-installed"
this prevents the non-zero exit while the logic stays the same. Just in case someone else comes across this issue.
I've tried these scripts on both code editor and jupyter lab apps. Yesterday they were working. Today, the scripts seem to be started as indicated by the last log in the
LifecycleConfigOnStart
JupyterLab/default/LifecycleConfigOnStart
/CodeEditor/default/LifecycleConfigOnStart
then:
the last few
JupyterLab/default
logs only show:repeated several times with different IDs, instead of the usual logs where the last activity's time in the JupyterLab is tracked.
The
/CodeEditor/default
logs weren't shown yesterday at all. Funny enough, the script was working as intended. Today, however, they are showing, just now outputting anything useful. And the script is also not working.Have you experienced something like this? I find it odd, that it's suddenly behaving like this without any apparent reason.
UPDATE
Apparently this might have to do with a new Sagemaker Distribution image. Yesterday, the latest Sagemaker Distribution image was 1.5. Today 1.6 seems to be the newest one. I tried this with the 1.5 version and it worked again. Then again with the 1.6 version and it doesn't work. This might be worth investigating to make sure the LCC works with the newest Sagemaker Distribution images