apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.23k stars 14.08k forks source link

Unable to run the docker-example_dags (Pypi install on airflow 2.0.1) #14189

Closed EricBoix closed 2 years ago

EricBoix commented 3 years ago

Apache Airflow version: 2.0.1 Kubernetes version : not used Environment: docker engine 20.10.0 (provided by docker-desktop 3.0.1, installed with brew cask install docker and then accepting the docker-desktop GUI proposal to upgrade to 3.0.1).

What happened: After installing airflow 2.0.1 with Pypi, I tried to run the docker examples (as provided by the Airflow github repository) from the CLI. This failed (for me).

What you expected to happen: Apparently the docker example dags seem to be failing because of an API mismatch between the version used in Airflow's docker-examples and the "contemporary" version of Docker Python wrappings.

In turn, could this be due (?) to:

How to reproduce it: Follow Airflow quick Pypi installation that boils down to:

$ virtualenv -p python3.8 venv     # 3.9 not supported yet
$ source venv/bin/activate
(venv) pip --version    # Yields 20.2.4 as required
(venv) export AIRFLOW_VERSION=2.0.1
(venv) export PYTHON_VERSION="$(python --version | cut -d " " -f 2 | cut -d "." -f 1-2)"
(venv) export CONSTRAINT_URL="https://raw.githubusercontent.com/apache/airflow/constraints-${AIRFLOW_VERSION}/constraints-${PYTHON_VERSION}.txt"
(venv) pip install "apache-airflow[docker,cncf.kubernetes]==${AIRFLOW_VERSION}" --constraint "${CONSTRAINT_URL}"
(venv) airflow version          ### Just to make sure

To run the docker-examples, the airflow sources are required

$ git clone -b 2.0.1 https://github.com/apache/airflow.git
$ mv airflow airflow.git
$ export AIRFLOW_REPO=`pwd`/airflow.git
$ mkdir $AIRFLOW_HOME/dags
$ cd $AIRFLOW_HOME/dags
$ ln -s $AIRFLOW_REPO/airflow/providers/docker/example_dags ./docker-example_dags

Proceed with running de docker_sample

(venv) pip install docker                               # Dependency should already be satisfied
(venv) airflow db init                                     # If not already done
(venv) airflow webserver -D --port 8080    # Shouldn't hurt although propably not required here (?)
(venv) airflow scheduler -D 
(venv) airflow dags list | grep -i docker      # Assert the docker examples are visible
(venv) airflow dags test docker_sample now

where this last command will issue an error of the form

{taskinstance.py:1396} ERROR - API versions below 1.21 are no longer supported by this library.

and further suggesting that airflow/venv/lib/python3.8/site-packages/airflow/providers/docker/operators/docker.py (line 314, in _get_cli) is using a deprecated API (with docker version 20.10.0 on OSX).

If we now manually downgrade the version of the docker python wrapper package starting from 3.7.3 (which is the version pinned by the CONSTRAINT_URL that thus was installed by pip, refer above) down to 3.0.0 with e.g.

pip install docker==3.0.0

then one consistently gets the same API version error message.

But starting with version 2.7.0 of the docker python wrappers then the connection to the docker daemon seems no longer possible (and should fail with message of the form requests.exceptions.ConnectionError: HTTPConnectionPool [...]).

Install minikube/kind

The short answer is here: I barely ended up learning docker and I overheard that Kubernetes deployment/usage/fiddling is heavier. The goal was/is thus to only use docker (as opposed to Kubernetes) that should suffice when debugging dags on a desktop.

Anything else we need to know: Nope. Seems about it.

boring-cyborg[bot] commented 3 years ago

Thanks for opening your first issue here! Be sure to follow the issue template!

kaxil commented 3 years ago

Try using the old version of Docker provider and if it works:

https://airflow.apache.org/docs/apache-airflow-providers-docker/stable/index.html#id1

pip install apache-airflow-providers-docker==1.0.0
EricBoix commented 3 years ago

Thanks @kaxil. I thus successfully installed apache-airflow-providers-docker-1.0.0 (that uninstalls version 1.0.1) while restoring docker version 3.7.3 (which is the version pinned by the CONSTRAINT_URL).

Alas I still get the same error message

docker.errors.InvalidVersion: API versions below 1.21 are no longer supported by this library.
EricBoix commented 3 years ago

While googling in despair, I discovered the existence of airflowdocker.io that seems loosely related with the official Apache development. Yet their approach seems to consider that "everything is a docker operator". If I really need to get some DockerOperator running would you advise giving up on the above described "traditional/official" way of things and switch to airflowdocker.io ?