In src/sagemaker_training/environment.py:1219, the values in sys.path (of the environment in which sagemaker-training-toolkit is executed) are added to the environment variable $PYTHONPATH.
This breaks virtual environments, because the result is that $PYTHONPATH ends up pointing to libraries in the base environment. So when we run code in a virtual conda environment, that environments' sys.path ends up looking at the $PYTHONPATH contents, which were set to the base environment's sys.path. In effect, this overwrites a virtual environments sys.path with the base environment's sys.path due to the sys.path search order link.
The end result is that the base environment libraries are given precedence over the virtual environment libraries.
We have a solution for now, which is to simply unset PYTHONPATH before we do anything with virtual environments.
However, I'm filing a bug report because it would be preferable for virtual environments to work out-of-the-box, and because this was a particularly difficult problem to find as it can present in so many different ways.
(For us, this issue was discovered because it led to a conda virtual environment with python 3.9 crashing because it attempted to use python 3.10 libraries in the base environment. Note that the error below is not important to understanding the issue, as there are many ways this issue can present itself - I'm only adding it to help others find this issue.)
Fatal Python error: init_sys_streams: can't initialize sys standard streams
Traceback (most recent call last):
File "/opt/conda/lib/python3.10/io.py", line 54, in <module>
ImportError: cannot import name 'text_encoding' from 'io' (unknown location)
To reproduce
See the minimal example below to reproduce. In this minimal example I install two different versions of isort, namely 5.11.4 in base conda environment and 5.12.0 in dev conda environment. The output of the example is as follows, which shows the problem in action:
***** python location and version *****
(BASE): conda run --no-capture-output --live-stream -n base which python && python --version
/opt/conda/bin/python
Python 3.10.12
(DEV): conda run --no-capture-output --live-stream -n dev which python && python --version
/opt/conda/envs/dev/bin/python
Python 3.10.12
***** isort version seen by conda *****
(BASE): conda run --no-capture-output --live-stream -n base conda list isort
# packages in environment at /opt/conda:
isort 5.11.4 pyhd8ed1ab_1 conda-forge
(DEV): conda run --no-capture-output --live-stream -n dev conda list isort
# packages in environment at /opt/conda/envs/dev:
isort 5.12.0 pyhd8ed1ab_1 conda-forge
***** isort version inside python *****
(BASE): conda run --no-capture-output --live-stream -n base python -c 'import isort; print(isort.__version__)'
5.11.4
(DEV): conda run --no-capture-output --live-stream -n dev python -c 'import isort; print(isort.__version__)'
5.11.4
***** isort version inside python - after unset PYTHONPATH*****
(BASE): unset PYTHONPATH && conda run --no-capture-output --live-stream -n base python -c 'import isort; print(isort.__version__)'
5.11.4
(DEV): unset PYTHONPATH && conda run --no-capture-output --live-stream -n dev python -c 'import isort; print(isort.__version__)'
5.12.0
To reproduce the above example, use the files problem.Dockerfile, launcher.py and wrong_version_demo.py as defined below (inserting appropriate values for your aws account + region where <insert> is written), and then locally do:
conda create -n problem
conda activate problem
mamba install sagemaker-python-sdk=2.166.0
docker build --file problem.Dockerfile --target base -t <insert> .
python launcher.py
problem.Dockerfile
# syntax=docker/dockerfile:1
FROM condaforge/mambaforge:23.1.0-2 as base
ARG DEBIAN_FRONTEND=noninteractive
RUN apt-get update -yqq
RUN apt-get install -yqq build-essential
RUN apt-get install -yqq python3-dev
RUN apt-get clean
RUN rm -rf /var/lib/apt/lists/* /var/lib/{apt,dpkg,cache,log}
RUN conda config --set always_yes true \
&& conda config --set unsatisfiable_hints true \
&& conda config --prepend channels defaults \
&& conda config --prepend channels conda-forge
RUN pip3 install sagemaker-training==4.6.1
RUN conda install -n base isort=5.11.4
RUN conda create -n dev python=3.10 isort=5.12.0
WORKDIR /work
Describe the bug
In src/sagemaker_training/environment.py:1219, the values in
sys.path
(of the environment in which sagemaker-training-toolkit is executed) are added to the environment variable$PYTHONPATH
.This breaks virtual environments, because the result is that
$PYTHONPATH
ends up pointing to libraries in thebase
environment. So when we run code in a virtual conda environment, that environments'sys.path
ends up looking at the$PYTHONPATH
contents, which were set to the base environment'ssys.path
. In effect, this overwrites a virtual environmentssys.path
with the base environment'ssys.path
due to thesys.path
search order link.The end result is that the base environment libraries are given precedence over the virtual environment libraries.
We have a solution for now, which is to simply
unset PYTHONPATH
before we do anything with virtual environments.However, I'm filing a bug report because it would be preferable for virtual environments to work out-of-the-box, and because this was a particularly difficult problem to find as it can present in so many different ways.
(For us, this issue was discovered because it led to a conda virtual environment with python 3.9 crashing because it attempted to use python 3.10 libraries in the base environment. Note that the error below is not important to understanding the issue, as there are many ways this issue can present itself - I'm only adding it to help others find this issue.)
To reproduce
See the minimal example below to reproduce. In this minimal example I install two different versions of
isort
, namely5.11.4
inbase
conda environment and5.12.0
indev
conda environment. The output of the example is as follows, which shows the problem in action:To reproduce the above example, use the files
problem.Dockerfile
,launcher.py
andwrong_version_demo.py
as defined below (inserting appropriate values for your aws account + region where<insert>
is written), and then locally do:problem.Dockerfile
launcher.py
wrong_version_demo.py
Expected behavior
Either:
Screenshots or logs N/A
System information Versions are specified in reproduced example section.
Additional context N/A