Open FrikadelleHelle opened 9 months ago
Can you provide the environment.yml packages? And are these logs from /aws/sagemaker/studio cloudwatch group?
Yes, these are logs for the /aws/sagemaker/studio
log group
This is a bare-bones example of environment.yml that fails for me.
name: base
channels:
- conda-forge
dependencies:
- python==3.10
- jupyterlab
- pip
- pip:
- ipykernel
- sagemaker
But its correct that this should work be able to work as a custom Jupyter Lab image in the new studio as well?
If it helps I can provide my config too
aws sagemaker describe-image
{
"CreationTime": 1707914687.086,
"DisplayName": "prod-sagemaker-image",
"ImageArn": "arn:aws:sagemaker:some-arn",
"ImageName": "image_name_1",
"ImageStatus": "CREATED",
"LastModifiedTime": 1707914688.064
},
aws sagemaker describe-app-image-config
{
"AppImageConfigArn": "arn:aws:sagemaker:some_app_image_config_arn",
"AppImageConfigName": "sagemaker-app-image-config",
"CreationTime": 1707403360.291,
"LastModifiedTime": 1707405437.631,
"KernelGatewayImageConfig": {
"KernelSpecs": [
{
"Name": "python3",
"DisplayName": "yesyes"
}
],
"FileSystemConfig": {
"MountPath": "/home/sagemaker-user",
"DefaultUid": 1000,
"DefaultGid": 100
}
},
"JupyterLabAppImageConfig": {
"ContainerConfig": {
"ContainerEntrypoint": [
"jupyter-lab"
]
}
}
},
aws pagemaker describe-domain
{
"DomainArn": "arn:aws:sagemaker:some_domain_arn",
"DomainId": "d-mmo40dnf710s",
"DomainName": "sagemaker-domain",
"HomeEfsFileSystemId": "fs-",
"SingleSignOnManagedApplicationInstanceId": "ins-",
"SingleSignOnApplicationArn": "arn:aws:sso::application/",
"Status": "InService",
"CreationTime": 1707688181.443,
"LastModifiedTime": 1707913859.291,
"AuthMode": "SSO",
"DefaultUserSettings": {
"ExecutionRole": "arn:aws:iam::some_role_arn",
"SecurityGroups": [
"sg-"
],
"JupyterServerAppSettings": {
"LifecycleConfigArns": []
},
"KernelGatewayAppSettings": {
"CustomImages": [
{
"ImageName": "prod-sagemaker-image",
"ImageVersionNumber": 1,
"AppImageConfigName": "sagemaker-app-image-config"
}
],
"LifecycleConfigArns": []
},
"CodeEditorAppSettings": {
"LifecycleConfigArns": []
},
"JupyterLabAppSettings": {
"DefaultResourceSpec": {
"InstanceType": "ml.t3.medium"
},
"CustomImages": [
{
"ImageName": "prod-sagemaker-image",
"ImageVersionNumber": 1,
"AppImageConfigName": "sagemaker-app-image-config"
}
],
"LifecycleConfigArns": [
"arn:aws:sagemaker:lifecycle_arn"
]
},
"SpaceStorageSettings": {
"DefaultEbsStorageSettings": {
"DefaultEbsVolumeSizeInGb": 5,
"MaximumEbsVolumeSizeInGb": 100
}
},
"DefaultLandingUri": "studio::",
"StudioWebPortal": "ENABLED"
},
"DomainSettings": {
"SecurityGroupIds": [
"sg-"
],
"DockerSettings": {
"EnableDockerAccess": "ENABLED",
"VpcOnlyTrustedAccounts": []
}
},
"AppNetworkAccessType": "VpcOnly",
"SubnetIds": [
"subnet-",
"subnet-"
],
"VpcId": "vpc-",
"AppSecurityGroupManagement": "Customer",
"DefaultSpaceSettings": {
"ExecutionRole": "arn:aws:iam::some_role_arn",
"SecurityGroups": [
"sg-"
],
"JupyterServerAppSettings": {
"DefaultResourceSpec": {
"SageMakerImageArn": "arn:aws:sagemaker:eu-north-1:243637512696:image/jupyter-server-3",
"InstanceType": "system"
}
}
}
}
Our team has struggled with this as well.
I tried my best to reproduce your image based on the Dockerfile and env.yml and was able to get it to work.
The main difference is that instead of relying on the app-image-config property:
"JupyterLabAppImageConfig": { "ContainerConfig": { "ContainerEntrypoint": [ "jupyter-lab" ] } } },
we define the ENTRYPOINT and CMD in our Dockefile directly in accordance with https://docs.aws.amazon.com/sagemaker/latest/dg/studio-updated-jl-image-specifications.html.
This was because we had a hard time getting the "ContainerEntrypoint"
to work.
Below is the Dockerfile I used (the micromamba config is due to our proxy):
FROM --platform=linux/amd public.ecr.aws/sagemaker/sagemaker-distribution:latest-cpu
USER $ROOT
RUN apt-get clean
# dependencies for building python and having opencv
RUN apt-get update && \
apt-get install -y gcc g++ python3-dev ffmpeg libsm6 libxext6 && \
rm -rf /var/lib/apt/lists/* && \
apt-get clean
USER $MAMBA_USER
# copy the environment.yml file into the container
COPY --chown=$MAMBA_USER:$MAMBA_USER env_help.yml /tmp/environment.yml
RUN micromamba config prepend channels "CONDA-FORGE-PROXY" && \
micromamba config prepend channels "CONDA-PROXY" && \
micromamba config set channel_alias "CONDA-PROXY" && \
micromamba config set channel_priority flexible && \
micromamba config set pip_interop_enabled True && \
micromamba config set ssl_verify /etc/ssl/certs/ca-certificates.crt
# Use micromamba to install the dependencies from the environment.yml file
RUN micromamba install -y -n base -f /tmp/environment.yml && \
micromamba clean --all --yes
ENTRYPOINT ["jupyter-lab"]
CMD ["--ServerApp.ip=0.0.0.0", "--ServerApp.port=8888", "--ServerApp.allow_origin=*", "--ServerApp.token=''", "--ServerApp.base_url=/jupyterlab/default"]
Your logs seem to suggest that the CMD portion of this is missing since you do not get these logs (last two):
2024-02-19T16:01:54.014-05:00 | [I 2024-02-19 21:01:53.893 ServerApp] Serving notebooks from local directory: /home/sagemaker-user | |
---|---|---|
2024-02-19T16:01:54.014-05:00 | [I 2024-02-19 21:01:53.893 ServerApp] Jupyter Server 2.10.0 is running at: | |
2024-02-19T16:01:54.014-05:00 | [I 2024-02-19 21:01:53.893 ServerApp] http://default:8888/jupyterlab/default/lab | |
2024-02-19T16:01:54.014-05:00 | [I 2024-02-19 21:01:53.894 ServerApp] http://127.0.0.1:8888/jupyterlab/default/lab |
NOTE:
I pushed the image to ECR and then just used the console to create and attach the image to the domain.
We use CDK to do our actual deployments.
Also, your app image config will have to at least have an empty {}
for the "JupyterLabAppImageConfig"
even if you decide to stop using this for the entrypoint stuff.
I am not quite sure where to report but since the docs outline how to build a custom image I will try here.
I am building this custom image and pushing it to ECR and adding to sagemaker images and creating app image config, like one would according to the docs.
I am defining my docker image like this
The only difference I can see in the logs is this these two lines at 2024-02-09T10:23:34.006+01:00 and 2024-02-09T10:23:34.006+01:00
In the working images they have a URL that's configured correctly.
I am in VPC only mode for the domain,, but I dont see how that should change anything since the sagemaker-distribution image works fine.
Would appreciate any pointer