aws-samples / amazon-sagemaker-studio-package-management

Other
13 stars 1 forks source link

Can't execute notebook job when using the Create Custom App notebook #1

Open amardeepranu opened 1 year ago

amardeepranu commented 1 year ago

When creating a BYOI with the Custom App Notebook I'm unable to start a notebook job with the custom image and kernel (usage of regular notebook works fine in studio), I get the following error:

AlgorithmError: Error with executing notebook ++ sed 's|\.|-|g; s|:|-|g; s|-*ipynb$|.ipynb|g;' ++ echo /opt/ml/output/data/linkedinsc-linkedinsc-2023-07-11T16:30:00.619Z-.ipynb Use Kernel conda-env-.conda-custom-py to execute linkedin_scraper_daily.ipynb with output to /opt/ml/output/data/linkedinsc-linkedinsc-2023-07-11T16-30-00-619Z.ipynb. Params: {} 
Executing: 0%| | 0/4 [00:00<?, ?cell/s]
Executing: 0%| | 0/4 [00:00<?, ?cell/s] Exception during processing notebook: No such kernel named conda-env-.conda-custom-py Traceback (most recent call last): File "/opt/ml/input/data/sagemaker_headless_execution_system/system/lib/notebookrunner.py", line 66, in run_notebook papermill.execute_notebook( File "/opt/conda/lib/python3.10/site-packages/papermill/execute.py", line 113, in execute_notebook nb = papermill_engines.execute_notebook_with_engine( File "/opt/conda/lib/python3.10/site-packages/papermill/engines.py", line 49, in execute_notebook_with_engine return self

Also in the notebook job UI the image and kernel sections are blank:

Screenshot 2023-07-11 at 12 36 43 PM

Comparing the logs between a notebook job on my custom image and the DataScience image I see the following difference:

This is the logs for the DataScience image which successfully runs:

2023-07-11T12:08:21.747-04:00 | ++ /usr/local/bin/python -c 'import sys; version=sys.version_info[:3]; print("{0}.{1}".format(*version))'
-- | --
  | 2023-07-11T12:08:21.747-04:00 | + DEFAULT_PYTHON_VERSION=3.10
  | 2023-07-11T12:08:21.747-04:00 | + '[' '!' -z Studio ']'
  | 2023-07-11T12:08:21.747-04:00 | + '[' Studio = DLC ']'
  | 2023-07-11T12:08:21.747-04:00 | + '[' Studio = Studio ']'
  | 2023-07-11T12:08:21.747-04:00 | + SM_EXEC_STEP='installing ipykernel for sagemaker 1P image'
  | 2023-07-11T12:08:21.747-04:00 | ++ /usr/local/bin/python -c 'import importlib.util; print(None != importlib.util.find_spec("ipykernel"))'
  | 2023-07-11T12:08:21.747-04:00 | + ipykernel_installed=True
  | 2023-07-11T12:08:21.747-04:00 | + echo 'Studio 1P Image: Studio - [ipykernel_installed: True]'
  | 2023-07-11T12:08:21.747-04:00 | Studio 1P Image: Studio - [ipykernel_installed: True]

These are the logs for the failing BYOI image:

2023-07-11T12:03:02.176-04:00 | + DEFAULT_PYTHON_VERSION=3.10
  | 2023-07-11T12:03:02.176-04:00 | + '[' '!' -z '' ']'
  | 2023-07-11T12:03:02.176-04:00 | + SM_EXEC_STEP='creating symbol link to simulate the EFS mounting path'
  | 2023-07-11T12:03:02.177-04:00

Looks like the kernel detection doesn't even run. Also the conda env thats being used is the "base" conda env not my custom env (which has ipykernel installed).

Here's my environment.yml:

name: customenv
channels:
  - conda-forge
dependencies:
  - python=3.9.*
  - ipykernel
  - pip
  - pip:
    - awscli
    - boto3
    - sagemaker
    - pandas

Here's my Dockerfile:

 FROM continuumio/miniconda3:latest

COPY environment*.yml ./

RUN apt-get -y update && \
    apt-get install -y --no-install-recommends sudo gettext-base wget curl awscli libnss3 && \
    # We just install tzdata below but leave default time zone as UTC. This helps packages like Pandas to function correctly.
    DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends tzdata && \
    chmod g+w /etc/passwd && \
    echo "ALL    ALL=(ALL)    NOPASSWD:    ALL" >> /etc/sudoers

RUN conda env update -f environment.yml --prune && \
    conda clean -afy

ENV SHELL=/bin/bash

Everything is running as root with 0/0 as the UID/GID.

Any help would be appreciated! Thanks.

pieroliviermarquis commented 1 year ago

Same issue. I am unable to use custom env in notebook jobs.