Open ryan-williams opened 2 years ago
@ryan-williams The pip install awscli ...
should be a no-op for any of the libraries that are already present in the image.
Yes, but if e.g. awscli
isn't already installed, installing it can change the versions of things that are already installed, including breaking them. The "Simpler example" section above illustrates this most directly.
To be clear, it's possible for the following to happen:
*boto*
versions$METAFLOW_BATCH_CONTAINER_IMAGE
, runs a flow --with batch
pip install
in the containerpip install
inadvertently changed the versions of things the user had already installed in the image (namely botocore
), resulting in other things the user installed (aiobotocore
) being brokenI don't know what the solution should be, but it is surprising and undesirable behavior, and enabled by a breaking change in boto in November that I suspect we will see wash around the ecosystem for some time to come, so it's good to be aware of this specific interaction with Metaflow's step-env setup logic.
Ran into this again today. Here's an updated link to the offending line, in 2.8.2.
Here's a simple repro:
# mf1.dockerfile
FROM python:3.9
WORKDIR /root
RUN pip install \
boto3==1.24.59 \
botocore==1.27.59 \
aiobotocore==2.4.2 \
s3fs==2023.1.0 \
pandas
# ✅ works fine, reads publicly-accessible CSV from S3. boto/s3fs/pandas versions are mutually compatible.
ENTRYPOINT [ "python", "-c", "import pandas as pd; print(pd.read_csv('s3://ctbk/csvs/JC-202301-citibike-tripdata.csv'))" ]
docker build -tmf1 -fmf1.dockerfile .
docker run --rm -it mf1
pip install awscli boto3
, breaking aiobotocore/s3fs/pandas# mf2.dockerfile
FROM mf1
RUN pip install awscli boto3 # 💥 this breaks the user's installs; `pd.read_csv("s3://…")` no longer works
Test image:
docker build -tmf2 -fmf2.dockerfile .
docker run --rm -it mf2
pd.read_csv
raises PermissionError: Forbidden
pip install awscli boto3
explicitly logs an ERROR
about breaking aiobotocore
:
docker run --rm -it --entrypoint pip mf1 install awscli boto3
# …
# ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
# aiobotocore 2.4.2 requires botocore<1.27.60,>=1.27.59, but you have botocore 1.29.110 which is incompatible.
# Successfully installed PyYAML-5.4.1 awscli-1.27.110 boto3-1.26.110 botocore-1.29.110 colorama-0.4.4 docutils-0.16 pyasn1-0.4.8 rsa-4.7.2
Simplest workaround remains to make sure both awscli and boto3 are both installed in any image you pass to Metaflow Batch mode, but Metaflow could/should do something more careful/correct here.
two features related to this have recently been released in #1972
We have gotten rid of the awscli
dependency completely so less possibility for dependency conflicts.
For other use cases that require completely disabling the dependency installs, setting the METAFLOW_SKIP_INSTALL_DEPENDENCIES
environment variable in the execution environment will do this. When using this, the execution environment needs to have the required bootstrapping dependencies available out of the box.
can this issue be considered closed with the latest changes?
Pasting the README from runsascoded/mf-pip-issue, where I have some repro files as well:
Metaflow/
pip
/Batch issueMetaflow runs
pip install awscli … boto3
while setting up task environements in Batch, which can breakaiobotocore<2.1.0
.Repro
Docker image runsascoded/mf-pip-issue-batch (
batch.dockerfile
) pins recent versions ofbotocore
andaiobotocore
:aiobotocore==1.4.2
(October 5, 2021)botocore==1.20.106
(July 6, 2021, required byaiobotocore==1.4.2
)Local mode: ✅
They work fine together normally; runsascoded/mf-pip-issue-local (
local.dockerfile
) runss3_flow_test.py
successfully (in "local" mode):Batch mode: ❌
However, with a Metaflow Batch queue configured:
fails with:
due to a version mismatch (
botocore>=1.23.0
,aiobotocore<2.1.0
).Version mismatch
botocore
removedClientCreator._register_lazy_block_unknown_fips_pseudo_regions
in1.23.0
, andaiobotocore
only updated tobotocore>=1.23.0
in2.1.0
, soaiobotocore<2.1.0
requiresbotocore<1.23.0
, otherwise reading from S3 via Pandas will raise this error.Cause
The version mismatch is caused by Metaflow running
pip install awscli … boto3
while setting up the task environment (in Batch and I believe k8s). Ifawscli
orboto3
aren't both installed already, it will pick a recent version to install, see that a recentbotocore
is also required by that version, and updatebotocore
to>=1.23.0
whileaiobotocore
is still<2.1.0
, breaking Pandas→S3 reading.Simpler example
Here we see
pip install awscli
breakaiobotocore<2.1.0
directly (in the same image as above):Here,
pip install awscli
upgradedbotocore
to a version that's incompatible with the already-installedaiobotocore
.Workaround
The simplest workaround I've found is to ensure Metaflow's
pip install awscli click requests boto3
command no-ops, by having some version of those libraries already installed in the image. They should also have consistent transitive dependency versions, otherwisepip install
will "help" with those as well).Scratch
These seem like the minimal Metaflow configs to submit to Batch (and reproduce the issue):
Docker build commands: