Closed yehoshuadimarsky closed 8 months ago
Yes. We just switched to uv
for CI images: https://github.com/apache/airflow/pull/37692 and indeed uv
is phenomenal. It saves as 50% - 60% of image building time in some cases. Yes. I absolutely plan to have uv
support and yes, I was waiting for it to support non-virtualenv settings, so that might be the right time to attempt it :)
BTW. It's not going to be very easy (and might need extra feature in uv
because we are using --user installation in PROD image to isolate system from user dependencies. i will open an issue for UV about it
Opened an issue for that https://github.com/astral-sh/uv/issues/2077
Found a way around it. For Airflow 2.9, uv should be fully supported - https://github.com/apache/airflow/pull/37796 regardless if uv flag will be supported or not by uv
. @yehoshuadimarsky -> can you please test if that solves your problem?
The way how to test it:
export AIRFLOW_VERSION=2.8.2
export DOCKER_BUILDKIT=1
docker build . \
--build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \
--build-arg AIRFLOW_VERSION="${AIRFLOW_VERSION}" \
--tag "my-tag:0.0.1"
This should be built by pip
- but when you get the image, you can extend it - latest uv
version should be already in and should work without the need of adding any flags.
You can also make another test:
export AIRFLOW_VERSION=2.8.2
export DOCKER_BUILDKIT=1
docker build . \
--build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \
--build-arg AIRFLOW_VERSION="${AIRFLOW_VERSION}" \
--build-arg AIRFLOW_USE_UV="true" \
--tag "my-tag:0.0.1"
Should use uv
to build the docker image. That one (according to my tests) should be 50%-60% faster to build from scratch comparing to the pip
one. We do not want to enable it by default, but it should be fully usable and working.
Can you please test and give feedback @yehoshuadimarsky ?
to be honest i'm not fully comfortable building the docker image using the source "code" Dockerfile in your PR, it's a bit long and overwhelming (>1500 line file). I'm in no rush, more of an enthusiast wanting to try out cool new tools, this is not a hard blocker for me now. I'm ok waiting until you finalize the workflow and have a finished image built.
to be honest i'm not fully comfortable building the docker image using the source "code" Dockerfile in your PR, it's a bit long and overwhelming (>1500 line file). I'm in no rush, more of an enthusiast wanting to try out cool new tools, this is not a hard blocker for me now. I'm ok waiting until you finalize the workflow and have a finished image built.
To be honest, there are good instructoins to run the build. It's exactly one Dockerfile to copy and single instruction to build it :) - and without people like you testihg it before I can only have hopes that it will solve your problem, but never a certainty. testing and trying before merge is the best way to see. If your tests confirm that it works, we can merge it with confidence, if it does not, then I have a chance to fix it before, it's of course up to you, but if it does not fit your needs after we release it in 2.9 - you have now opportunity to prevent it. You can make use of it or not, but then it it does not, then you might have only self-complaint that you have not tested it.
But of course - it's up to you to help to test it or not.
(and if it does not work we will find out about month from now - when 2.9 will be released BTW).
Just tried building the image on Windows following these steps:
$env:AIRFLOW_VERSION="2.8.2"
$env:DOCKER_BUILDKIT=1
docker build . --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" --build-arg AIRFLOW_VERSION="${AIRFLOW_VERSION}" --tag "my-tag:0.0.1"
4. But I got this error
<details>
[airflow-build-image 3/16] RUN bash /scripts/docker/install_os_dependencies.sh dev: : invalid option name/install_os_dependencies.sh: line 2: set: pipefail
[main 3/19] RUN bash /scripts/docker/install_os_dependencies.sh runtime: : invalid option name/install_os_dependencies.sh: line 2: set: pipefail
Dockerfile:1338
1336 | 1337 | COPY --from=scripts install_os_dependencies.sh /scripts/docker/ 1338 | >>> RUN bash /scripts/docker/install_os_dependencies.sh dev 1339 | 1340 | ARG INSTALL_MYSQL_CLIENT="true"
ERROR: failed to solve: process "/bin/bash -o pipefail -o errexit -o nounset -o nolog -c bash /scripts/docker/install_os_dependencies.sh dev" did not complete successfully: exit code: 2
View build details: docker-desktop://dashboard/build/default/default/dpgt50krcuoyudg0qo6mv8mte
</details>
Is this because I'm on Windows not Unix?
**Edit:** So tried using GitHub Codespaces, got this:
root@53324cd94090:/opt/airflow# which docker /usr/bin/docker root@53324cd94090:/opt/airflow# export AIRFLOW_VERSION=2.8.2 export DOCKER_BUILDKIT=1 root@53324cd94090:/opt/airflow# docker build . \ --build-arg PYTHON_BASE_IMAGE="python:3.8-slim-bookworm" \ --build-arg AIRFLOW_VERSION="${AIRFLOW_VERSION}" \ --tag "my-tag:0.0.1" ERROR: BuildKit is enabled but the buildx component is missing or broken. Install the buildx component to build images with BuildKit: https://docs.docker.com/go/buildx/ root@53324cd94090:/opt/airflow#
Edit #3: Followed instructions here https://docs.docker.com/engine/install/debian/ to setup Docker with BuildKit on GH Codespaces, now I can build the image. It ran for a while, but then failed with this output:
1.git config --global core.autocrlf
should fix windows
2.3 -> updated default. in the new version now.. Seems uv
was set as default to build image but still has some teething problems. The update show switch image building to pip
but uv
should work now.
All right - I did more testing and I think It should work nicer now :)
Just tried again, this time on my M1 Macbook (2020 Pro), doesn't work:
~/Documents/GitHub/ git clone git@github.com:apache/airflow.git airflow-upstream
Cloning into 'airflow-upstream'...
remote: Enumerating objects: 400720, done.
remote: Counting objects: 100% (662/662), done.
remote: Compressing objects: 100% (410/410), done.
remote: Total 400720 (delta 282), reused 573 (delta 244), pack-reused 400058
Receiving objects: 100% (400720/400720), 258.46 MiB | 10.83 MiB/s, done.
Resolving deltas: 100% (311879/311879), done.
~/Documents/GitHub/ cd airflow-upstream
~/Documents/GitHub/airflow-upstream/ [main] git checkout use-uv-for-prod-image
branch 'use-uv-for-prod-image' set up to track 'origin/use-uv-for-prod-image'.
Switched to a new branch 'use-uv-for-prod-image'
Here's the command and the error output:
Tried it now also in my GH Codespaces env, appears to be the same error.
Setup:
~/Documents/GitHub/airflow-upstream/ [use-uv-for-prod-image] gh codespace ssh
? Choose codespace: apache/airflow (use-uv-for-prod-image): fictional orbit
Linux 53324cd94090 6.2.0-1019-azure #19~22.04.1-Ubuntu SMP Wed Jan 10 22:57:03 UTC 2024 x86_64
The programs included with the Debian GNU/Linux system are free software;
the exact distribution terms for each program are described in the
individual files in /usr/share/doc/*/copyright.
Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
permitted by applicable law.
root@53324cd94090:/workspaces/airflow# git status
On branch use-uv-for-prod-image
Your branch is up to date with 'origin/use-uv-for-prod-image'.
nothing to commit, working tree clean
root@53324cd94090:/workspaces/airflow# git pull
remote: Enumerating objects: 184, done.
remote: Counting objects: 100% (182/182), done.
remote: Compressing objects: 100% (126/126), done.
remote: Total 184 (delta 69), reused 153 (delta 51), pack-reused 2
Receiving objects: 100% (184/184), 340.68 KiB | 9.21 MiB/s, done.
Resolving deltas: 100% (69/69), completed with 15 local objects.
From https://github.com/apache/airflow
+ 8773cb8d8c...e54302f882 use-uv-for-prod-image -> origin/use-uv-for-prod-image (forced update)
32bf0817f2..b921d4cd6f constraints-main -> origin/constraints-main
a7691b787b..73a632a5a0 main -> origin/main
* [new branch] update-helm-check-instructions -> origin/update-helm-check-instructions
* [new tag] helm-chart/1.13.0rc1 -> helm-chart/1.13.0rc1
hint: You have divergent branches and need to specify how to reconcile them.
hint: You can do so by running one of the following commands sometime before
hint: your next pull:
hint:
hint: git config pull.rebase false # merge
hint: git config pull.rebase true # rebase
hint: git config pull.ff only # fast-forward only
hint:
hint: You can replace "git config" with "git config --global" to set a default
hint: preference for all repositories. You can also pass --rebase, --no-rebase,
hint: or --ff-only on the command line to override the configured default per
hint: invocation.
fatal: Need to specify how to reconcile divergent branches.
root@53324cd94090:/workspaces/airflow# git pull --rebase
Successfully rebased and updated refs/heads/use-uv-for-prod-image.
root@53324cd94090:/workspaces/airflow# git pull
Already up to date.
root@53324cd94090:/workspaces/airflow# git status
On branch use-uv-for-prod-image
Your branch is up to date with 'origin/use-uv-for-prod-image'.
nothing to commit, working tree clean
root@53324cd94090:/workspaces/airflow# git remote -v
origin https://github.com/apache/airflow (fetch)
origin https://github.com/apache/airflow (push)
root@53324cd94090:/workspaces/airflow# export AIRFLOW_VERSION=2.8.2
root@53324cd94090:/workspaces/airflow# export DOCKER_BUILDKIT=1
Command and error:
Should be fine now. Missed a case there.
That works! Thanks
Thanks for helping with tests. Also --build-arg AIRFLOW_USE_UV=true
should build the image using uv (not only allowing you to use uv to install) - which should be much faster than building image with pip
.
Not sure if this matters this much, but building the Docker image on Windows still fails, even after I ran git config --global core.autocrlf
You likely need to check it out from scratch for autocrlf to be effective.
Description
I am trying to use the new phenomenal tool
uv
to add python packages on top of the base Airflow image. I kept running into the issue ofuv
not supporting non-virtual-envs, but this is now supported with version 0.1.12Here is my sample Dockerfile. I cannot get it to work with either of these new flags:
--python
or--system
.In both cases, I get this error:
I thought, ok, let me run it as
USER root
, but then I get an error about installingpip
packages as root, from here: https://airflow.apache.org/docs/docker-stack/build.html#adding-new-pypi-packages-individuallyAny suggestions on using
uv
with the Airflow Docker image?Thanks
Use case/motivation
No response
Related issues
I see that Airflow has already adopted
uv
internally for its CI here and see (https://github.com/apache/airflow/issues?q=uv), so I think this should be doable for the external Docker image as wellAre you willing to submit a PR?
Code of Conduct