Closed potiuk closed 3 years ago
This one duplicates #8548 a bit - but I want to leave it for a while as I wanted to split it into smaller functional pieces.
It would be nice to have this in "Quick Start Guide when using Docker Image" too. WDYT
Absolutely. It's already planned in #8542 :)
Added missing label :)
Here is another example of a Docker Compose that I've been working on. The Compose defines multiple services to run Airflow.
There is an init service which is an ephemeral container to initialize the database and creates a user if necessary.
The init service command tries to run airflow list_users
and if it fails it initializes the database and creates a user. Different approaches were considered but this one is simple enough and only involves airflow commands (no database-specific commands).
Extension fields are used for airflow environment variables to reduce code duplication.
I added a Makefile along the docker-compose.yml in my repo so all you have to do to run the docker-compose is run make run
.
version: "3.7"
x-airflow-environment: &airflow-environment
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__WEBSERVER__RBAC: "True"
AIRFLOW__CORE__LOAD_EXAMPLES: "False"
AIRFLOW__CELERY__BROKER_URL: "redis://:@redis:6379/0"
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
services:
postgres:
image: postgres:11.5
environment:
POSTGRES_USER: airflow
POSTGRES_DB: airflow
POSTGRES_PASSWORD: airflow
redis:
image: redis:5
environment:
REDIS_HOST: redis
REDIS_PORT: 6379
ports:
- 6379:6379
init:
image: apache/airflow:1.10.10
environment:
<<: *airflow-environment
depends_on:
- redis
- postgres
volumes:
- ./dags:/opt/airflow/dags
entrypoint: /bin/bash
command: >
-c "airflow list_users || (airflow initdb
&& airflow create_user --role Admin --username airflow --password airflow -e airflow@airflow.com -f airflow -l airflow)"
restart: on-failure
webserver:
image: apache/airflow:1.10.10
ports:
- 8080:8080
environment:
<<: *airflow-environment
depends_on:
- init
volumes:
- ./dags:/opt/airflow/dags
command: "webserver"
restart: always
flower:
image: apache/airflow:1.10.10
ports:
- 5555:5555
environment:
<<: *airflow-environment
depends_on:
- redis
command: flower
restart: always
scheduler:
image: apache/airflow:1.10.10
environment:
<<: *airflow-environment
depends_on:
- webserver
volumes:
- ./dags:/opt/airflow/dags
command: scheduler
restart: always
worker:
image: apache/airflow:1.10.10
environment:
<<: *airflow-environment
depends_on:
- scheduler
volumes:
- ./dags:/opt/airflow/dags
command: worker
restart: always
Here's my docker-compose config using LocalExecutor...
version: '2.1'
services:
airflow:
# image: apache/airflow:1.10.10
build:
context: .
args:
- DOCKER_UID=${DOCKER_UID-1000}
dockerfile: Dockerfile
restart: always
environment:
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=postgres://airflow:${POSTGRES_PW-airflow}@postgres:5432/airflow
- AIRFLOW__CORE__FERNET_KEY=${AF_FERNET_KEY-GUYoGcG5xdn5K3ysGG3LQzOt3cc0UBOEibEPxugDwas=}
- AIRFLOW__CORE__EXECUTOR=LocalExecutor
- AIRFLOW__CORE__AIRFLOW_HOME=/opt/airflow/
- AIRFLOW__CORE__LOAD_EXAMPLES=False
- AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS=False
- AIRFLOW__CORE__LOGGING_LEVEL=${AF_LOGGING_LEVEL-info}
volumes:
- ../airflow/dags:/opt/airflow/dags:z
- ../airflow/plugins:/opt/airflow/plugins:z
- ./volumes/airflow_data_dump:/opt/airflow/data_dump:z
- ./volumes/airflow_logs:/opt/airflow/logs:z
healthcheck:
test: ["CMD-SHELL", "[ -f /opt/airflow/airflow-webserver.pid ]"]
interval: 30s
timeout: 30s
retries: 3
version: '2.1'
services:
postgres:
image: postgres:9.6
container_name: af_postgres
environment:
- POSTGRES_USER=airflow
- POSTGRES_PASSWORD=${POSTGRES_PW-airflow}
- POSTGRES_DB=airflow
- PGDATA=/var/lib/postgresql/data/pgdata
volumes:
- ./volumes/postgres_data:/var/lib/postgresql/data/pgdata:Z
ports:
- 127.0.0.1:5432:5432
webserver:
extends:
file: docker-compose.airflow.yml
service: airflow
container_name: af_webserver
command: webserver
depends_on:
- postgres
ports:
- ${DOCKER_PORTS-8080}
networks:
- proxy
- default
environment:
# Web Server Config
- AIRFLOW__WEBSERVER__DAG_DEFAULT_VIEW=graph
- AIRFLOW__WEBSERVER__HIDE_PAUSED_DAGS_BY_DEFAULT=true
- AIRFLOW__WEBSERVER__RBAC=true
# Web Server Performance tweaks
# 2 * NUM_CPU_CORES + 1
- AIRFLOW__WEBSERVER__WORKERS=${AF_WORKERS-2}
# Restart workers every 30min instead of 30seconds
- AIRFLOW__WEBSERVER__WORKER_REFRESH_INTERVAL=1800
labels:
- "traefik.enable=true"
- "traefik.http.routers.airflow.rule=Host(`af.example.com`)"
- "traefik.http.routers.airflow.middlewares=admin-auth@file"
scheduler:
extends:
file: docker-compose.airflow.yml
service: airflow
container_name: af_scheduler
command: scheduler
depends_on:
- postgres
environment:
# Performance Tweaks
# Reduce how often DAGs are reloaded to dramatically reduce CPU use
- AIRFLOW__SCHEDULER__MIN_FILE_PROCESS_INTERVAL=${AF_MIN_FILE_PROCESS_INTERVAL-60}
- AIRFLOW__SCHEDULER__MAX_THREADS=${AF_THREADS-1}
networks:
proxy:
external: true
# Custom Dockerfile
FROM apache/airflow:1.10.10
# Install mssql support & dag dependencies
USER root
RUN apt-get update -yqq \
&& apt-get install -y gcc freetds-dev \
&& apt-get install -y git procps \
&& apt-get install -y vim
RUN pip install apache-airflow[mssql,mssql,ssh,s3,slack]
RUN pip install azure-storage-blob sshtunnel google-api-python-client oauth2client \
&& pip install git+https://github.com/infusionsoft/Official-API-Python-Library.git \
&& pip install rocketchat_API
# This fixes permission issues on linux.
# The airflow user should have the same UID as the user running docker on the host system.
# make build is adjust this value automatically
ARG DOCKER_UID
RUN \
: "${DOCKER_UID:?Build argument DOCKER_UID needs to be set and non-empty. Use 'make build' to set it automatically.}" \
&& usermod -u ${DOCKER_UID} airflow \
&& find / -path /proc -prune -o -user 50000 -exec chown -h airflow {} \; \
&& echo "Set airflow's uid to ${DOCKER_UID}"
USER airflow
And here's my Makefile to control it the containers like make run
:
SERVICE = "scheduler"
TITLE = "airflow containers"
ACCESS = "http://af.example.com"
.PHONY: run
build:
docker-compose build
run:
@echo "Starting $(TITLE)"
docker-compose up -d
@echo "$(TITLE) running on $(ACCESS)"
runf:
@echo "Starting $(TITLE)"
docker-compose up
stop:
@echo "Stopping $(TITLE)"
docker-compose down
restart: stop print-newline run
tty:
docker-compose run --rm --entrypoint='' $(SERVICE) bash
ttyr:
docker-compose run --rm --entrypoint='' -u root $(SERVICE) bash
attach:
docker-compose exec $(SERVICE) bash
attachr:
docker-compose exec -u root $(SERVICE) bash
logs:
docker-compose logs --tail 50 --follow $(SERVICE)
conf:
docker-compose config
initdb:
docker-compose run --rm $(SERVICE) initdb
upgradedb:
docker-compose run --rm $(SERVICE) upgradedb
print-newline:
@echo ""
@echo ""
@potiuk Is this the preferred way to add dependencies (airflow-mssql)?
# Custom Dockerfile
FROM apache/airflow:1.10.10
# Install mssql support & dag dependencies
USER root
RUN apt-get update -yqq \
&& apt-get install -y gcc freetds-dev \
&& apt-get install -y git procps \
&& apt-get install -y vim
RUN pip install apache-airflow[mssql,mssql,ssh,s3,slack]
RUN pip install azure-storage-blob sshtunnel google-api-python-client oauth2client \
&& pip install git+https://github.com/infusionsoft/Official-API-Python-Library.git \
&& pip install rocketchat_API
# This fixes permission issues on linux.
# The airflow user should have the same UID as the user running docker on the host system.
# make build is adjust this value automatically
ARG DOCKER_UID
RUN \
: "${DOCKER_UID:?Build argument DOCKER_UID needs to be set and non-empty. Use 'make build' to set it automatically.}" \
&& usermod -u ${DOCKER_UID} airflow \
&& find / -path /proc -prune -o -user 50000 -exec chown -h airflow {} \; \
&& echo "Set airflow's uid to ${DOCKER_UID}"
USER airflow
I the preferred way will be to set properly AIRFLOW_EXTRAS variable and pass them as --build-arg
They are defined like that in the Dockerfile:
ARG AIRFLOW_EXTRAS="async,aws,azure,celery,dask,elasticsearch,gcp,kubernetes,mysql,postgres,redis,slack,ssh,statsd,virtualenv"
and when building the dockerfile you can set them as --build-arg AIRFLOW_EXTRAS="...."
I think that maybe it's worth to have "additional extras" and append them though
Oh, that's super cool. But for that you have to rebuild the entire airflow image? Can you just add the build arg in the docker-compose and it will propagate through to the published airflow image?
You should also be able to build a new image using ON_BUILD feature - for building images depending on the base one. Added a separate issue here: #8872
The same applies to additional Python packages. https://github.com/puckel/docker-airflow/blob/master/Dockerfile#L64
if [ -n "${PYTHON_DEPS}" ]; then pip install ${PYTHON_DEPS}; fi
My Apache Airflow docker-compose file for running LocalExecutor with postgres using official production Dockerfile
Moved to gist
my two cents: https://github.com/xnuinside/airflow_in_docker_compose/blob/master/docker-compose-with-celery-executor.yml and .env file for it https://github.com/xnuinside/airflow_in_docker_compose/blob/master/.env
Ready for up&run. But for prod need turn on RBAC.
Hello. I made kind of a mix of the examples here to make my own set of docker files. I ended up with docker-compose, Dockerfile and Makefile. Using the docker-compose and Makefile from this post as a starting point, I have already solved some of the problems we encountered as we adapted it to our needs, but as a Docker and Airflow noob, I would have liked if these needs had already been addressed by an agreed-upon best-practice solution, so I'll mention them just in case you can include them in the future file (or mention how to address them in a tutorial or something):
Regarding the docker-compose, I would like to see an explanation of why having separately the webserver and the scheduler, how it works... For instance, I don't understand if in some cases a command could be added to just one of the containers
The code I have currently is: Dockerfile
FROM apache/airflow:1.10.10
COPY plugins/aws_secrets_manager_backend.py /home/airflow/.local/lib/python3.6/site-packages/airflow/contrib/secrets/aws_secrets_manager.py
COPY plugins/aws_secrets_manager_hook.py /home/airflow/.local/lib/python3.6/site-packages/airflow/hooks/aws_secrets_manager_hook.py
COPY hooks_init.py /home/airflow/.local/lib/python3.6/site-packages/airflow/hooks/__init__.py
COPY aws_config /home/airflow/.aws/config
COPY aws_credentials /home/airflow/.aws/credentials
COPY requirements.txt requirements.txt
RUN pip3 install -r requirements.txt --user
docker-compose
version: "3.7"
x-airflow-environment: &airflow-environment
AIRFLOW__CORE__EXECUTOR: LocalExecutor
AIRFLOW__CORE__LOAD_EXAMPLES: "False"
AIRFLOW__CORE__SQL_ALCHEMY_CONN: postgresql+psycopg2://airflow:airflow@postgres:5432/airflow
AIRFLOW__CORE__FERNET_KEY: FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
AIRFLOW__WEBSERVER__DAG_DEFAULT_VIEW: graph
AIRFLOW__SECRETS__BACKEND: airflow.contrib.secrets.aws_secrets_manager.SecretsManagerBackend
AIRFLOW__OPERATORS__DEFAULT_RAM: 2048
services:
postgres:
image: postgres:11.5
environment:
POSTGRES_USER: airflow
POSTGRES_DB: airflow
POSTGRES_PASSWORD: airflow
init:
build: .
environment:
<<: *airflow-environment
depends_on:
- postgres
volumes:
- ./dags:/opt/airflow/dags
- ./plugins:/opt/airflow/plugins
- ./logs:/opt/airflow/logs
entrypoint: /bin/bash
command: >
-c "airflow list_users || (airflow initdb
&& airflow create_user --role Admin --username airflow --password airflow -e airflow@airflow.com -f airflow -l airflow)"
restart: on-failure
webserver:
build: .
ports:
- 8080:8080
environment:
<<: *airflow-environment
depends_on:
- init
volumes:
- ./dags:/opt/airflow/dags
- ./plugins:/opt/airflow/plugins
- ./variables_secret.json:/opt/airflow/variables_secret.json
- ./logs:/opt/airflow/logs
- ./utilities:/opt/airflow/utilities
entrypoint: /bin/bash
command: -c "airflow variables -i /opt/airflow/variables_secret.json && airflow webserver"
restart: always
scheduler:
build: .
environment:
<<: *airflow-environment
depends_on:
- webserver
volumes:
- ./dags:/opt/airflow/dags
- ./plugins:/opt/airflow/plugins
- ./variables_secret.json:/opt/airflow/variables_secret.json
- ./logs:/opt/airflow/logs
- ./utilities:/opt/airflow/utilities
entrypoint: /bin/bash
command: -c "airflow variables -i /opt/airflow/variables_secret.json && airflow scheduler"
restart: always
Makefile
.PHONY: run stop rm
run:
docker-compose -f docker-compose.yml up -d --remove-orphans --build --force-recreate
@echo "Airflow running on http://localhost:8080"
stop:
docker-compose -f docker-compose.yml stop
rm: stop
docker-compose -f docker-compose.yml rm
Kind regards
Hello. I have encountered another issue. I want to use make html
command of sphinx within the container. I have found that the make command is not found. So I have added to the Dockerfile the following lines:
USER root
RUN sudo apt-get update && sudo apt-get install build-essential -y
USER airflow
Maybe not the best approach and there is a good idea to address this somewhere else. Kind regards
Hello. I have encountered another issue. I want to use
make html
command of sphinx within the container. I have found that the make command is not found. So I have added to the Dockerfile the following lines:USER root RUN sudo apt-get update && sudo apt-get install build-essential -y USER airflow
Maybe not the best approach and there is a good idea to address this somewhere else. Kind regards
This is already addressed - see thehttps://github.com/apache/airflow/blob/master/IMAGES.rst#production-images where examples are shown how to manually build images.
The production image is highly optimized for size so it is multi-segmented one - the first segment is used to add "build dependencies" (and build-essentials are there) but then only compiled libraries and python code are copied to the "main" image - which makes it 200MB instead of 400MB at least.
Your best bet, in this case, is to add commands to the Dockerfile in the "build" segment and copy whatever is the result of it via COPY --from if you are using production node.
Also, you can watch my talk about it from the Airflow Summit - where I talk about the image https://s.apache.org/airflow-prod-image
Instead of modifying the existing image, you can also build from the finished image and add your own stuff.
Here's my image for example:
FROM apache/airflow:1.10.12
USER root
# This fixes permission issues on linux.
# The airflow user should have the same UID as the user running docker on the host system.
# make build is adjust this value automatically
ARG DOCKER_UID
RUN \
: "${DOCKER_UID:?Build argument DOCKER_UID needs to be set and non-empty. Use 'make build' to set it automatically.}" \
&& usermod -u ${DOCKER_UID} airflow \
&& groupmod -g ${DOCKER_UID} airflow \
&& chown -Rhc --from=50000 ${DOCKER_UID} / || true \
&& chown -Rhc --from=:50000 :${DOCKER_UID} / || true \
&& echo "Set airflow's uid and gid to ${DOCKER_UID}"
# Install cmd utils
RUN apt-get update -yqq \
&& apt-get install -y git \
procps \
vim
# Install MS SQL Support (ODBC Driver)
RUN apt-get update && apt-get install -y gnupg curl && curl https://packages.microsoft.com/keys/microsoft.asc | apt-key add --no-tty - && curl https://packages.microsoft.com/config/debian/10/prod.list > /etc/apt/sources.list.d/mssql-release.list && apt-get update && ACCEPT_EULA=Y apt-get install -y msodbcsql17 unixodbc-dev g++
USER airflow
# Install Apache 2.0 backports for mssql
RUN pip install --user apache-airflow-backport-providers-odbc \
apache-airflow-backport-providers-microsoft-mssql
# Install airflow packages
RUN pip install --user apache-airflow[slack]
# Install plugin dependencies
RUN pip install --user azure-storage-blob \
sshtunnel \
google-api-python-client \
oauth2client \
beautifulsoup4 \
dateparser \
rocketchat_API \
typeform
Instead of modifying the existing image, you can also build from the finished image and add your own stuff.
That's true - you can do that. The drawback of this solution though is that the image will be much bigger (in this case it will contain unixodbc-dev and g++ which on its own drags a number of dependencies (most of build-essentials) which likely adds several 100s of MB of stuff that is not needed in the final image.
I am actually thinking on how to make it even easier to accommodate such cases and will think a bit how this can be done and try to address that in #10856
I just watched your presentation on docker and it's really amazing how much thought and effort you put in the image and the optimization.
Building the custom size-optimized image from source is a great option for many people, especially if they are working in corporate and need the security review.
But for many other, especially smaller businesses, ease of getting started, setup and maintenance can be more important.
So after reviewing both options, I will stick with the extension option.
I am also looking forward to seeing official docker-compose files. I think right now getting started with airflow in docker is a bit difficult.
It used to be very easy with puckel's image, but it's outdated now. And if someone wants to run airflow in docker, they have to come up with their own compose files and hunt down examples all over the place.
It can be difficult without an official example, especially considering you have to run multiple containers for the scheduler and webserver.
One more thought:
I think there are multiple purposes for a docker image. The most important is of course running in production that justifies a more complicated process.
But I think many people also use docker to quickly test-drive software.
I know that from my own experience. Whenever I consider a new open source software, the first thing I check is whether they have a docker image and ideally a docker-compose example.
This way I can get a decent example setup in just a few minutes to evaluate the software.
So even if using docker-compose is not recommended in production, I think creating official examples would be great for the future adoption of Airflow.
One more thought:
I think there are multiple purposes for a docker image. The most important is of course running in production that justifies a more complicated process.
But I think many people also use docker to quickly test-drive software.
I know that from my own experience. Whenever I consider a new open source software, the first thing I check is whether they have a docker image and ideally a docker-compose example.
This way I can get a decent example setup in just a few minutes to evaluate the software.
So even if using docker-compose is not recommended in production, I think creating official examples would be great for the future adoption of Airflow.
Agree!!
Those are all valid points and I was kind of waiting for someone to come up with that. When I looked at puckel, it was not really "production ready" and it did pretty much "everything but the kitchen sink" ;). So I figured the best option will be to start from something well engineered, but with doing only one thing well being optimised for production and listen to people complaining what they miss from Puckel - and then implement it "well" without breaking the optimisatons.
Now - since I already heard that several times, seems like this is a super-valid use case that people want to use the image for and that's a lot of great information that might help me to design it well :).
I will take a look at that shortly!
Totally valid to wait for someone else to come up with that.
If it meets the AF quality standard, I would be happy to create a PR if that makes things easier for you.
Since docker-compose is just used for DEV, please go for it @KimchaC . We can iterate over it if needed
absolutely!
@KimchaC FYI. I have just submitted #11176 PR that should make it possible to build even such complex images as you explained in https://github.com/apache/airflow/issues/8605#issuecomment-690065621 via passing appropriate build args. In fact I even made an example on how to build such image based on your example.
Thera are few more changes coming (we've implemented quite some extensions to the build process when working on a customer project and I am just contributing it back. The final version of the Dockerfile/Breeze/Docker build process we will come up with will produce super-optimized (for size) images, with very high customizability of all the components of the image building - in the way that you can even use it to build airflow image on an air-gapped system.
The nice thing about it, that the customer can fully rely on the Airlfow Dockerfile process and keep up with future changes and ad their own customisations as needed and have full control over the build process.
This is all result of our project with really big customer who was very concerned about security of the images, binaries and the whole build process they had. At the end we we are going to have (I hope ) a super-flexible image that we can develop further that will be equally easy to use in CI/ OSS environment and a more strict corporate environment. A few more PRs are coming (we already have them in the customer's fork, but we are bringing them in one-by-one). Everything is under the #11171 umbrella.
It woudl be great @KimchaC if you try it out with your setup, configuration, and maybe other customizations, and this way we could possibly implement anything that we missed. Looking forward to it!
@KimchaC ^ PR merged. You could try to build your image now using the command line paratmeters and compare the size vs. your previous image. I bet it will be quite a bit smaller.
Hi @potiuk ,
In this link(https://hub.docker.com/r/apache/airflow/dockerfile), there is no production level dockerfile is there. Yesterday saw the production version.
I am finding difficulty understanding the production level image creation(mainly customizing the image),
I am following the below two mentioned links for reference,
https://airflow.readthedocs.io/en/latest/production-deployment.html#production-image-build-arguments
https://github.com/apache/airflow/blob/master/IMAGES.rst#production-images
This builds the production image in version 3.7 with additional airflow extras from 1.10.10 Pypi package and additional apt dev and runtime dependencies.(as per production-deployment.html)
docker build . \ --build-arg PYTHON_BASE_IMAGE="python:3.7-slim-buster" \ --build-arg PYTHON_MAJOR_MINOR_VERSION=3.7 \ --build-arg AIRFLOW_INSTALL_SOURCES="apache-airflow" \ --build-arg AIRFLOW_INSTALL_VERSION="==1.10.12" \ --build-arg AIRFLOW_CONSTRAINTS_REFERENCE="constraints-1-10" \ --build-arg AIRFLOW_SOURCES_FROM="empty" \ --build-arg AIRFLOW_SOURCES_TO="/empty" \ --build-arg ADDITIONAL_AIRFLOW_EXTRAS="jdbc" --build-arg ADDITIONAL_PYTHON_DEPS="pandas" --build-arg ADDITIONAL_DEV_APT_DEPS="gcc g++" --build-arg ADDITIONAL_RUNTIME_APT_DEPS="default-jre-headless" --tag my-image
Whether we need to pass dockerfile with -f flag? I have tried above command and observed , it is looking for dockerfile.
If AIRFLOW_INSTALL_SOURCES=".", it points the installation from local sources(as per documentation). How it works?
When I use the above command with -f Dockerfile, during the build process, I am facing this exception while running the step COPY scripts/docker scripts/docker --> "COPY failed: stat /var/lib/docker/tmp/docker-builder125807076/scripts/docker: no such file or directory". Whether i have to clone the git repo and then have to use docker build command?
In this link(https://hub.docker.com/r/apache/airflow/dockerfile), there is no production level dockerfile is there. Yesterday saw the production version.
I see it there (even in incognito mode). Must have been a temporary glitch of DockerHub.
- Whether we need to pass dockerfile with -f flag? I have tried above command and observed , it is looking for dockerfile.
As mentioned in the docs above, if you want to customize the image you need to checkout airflow sources and run the docker command inside the Airfllow sources. As it is in case of most Dockerfiles, they need context ("." in the command) and some extra files (for example entrypoint scripts) that have to be available in this context, and the easiest way it is to checkout Airflow Sources in the right version and customize the image from there.
You can find a nice description in here: https://airflow.readthedocs.io/en/latest/production-deployment.html - we moved the documentation to "docs" and it has not yet been released (but it will be in 1.10.13) - but you can use the "latest" version - it contains all detailed description of customizing vs. extending and even a nice table showing what are the differences - one point there is that you need to use Airflow sources to customize the image.
- If AIRFLOW_INSTALL_SOURCES=".", it points the installation from local sources(as per documentation). How it works?
See above - you need to run in inside checked out sources of Airflow.
- When I use the above command with -f Dockerfile, during the build process, I am facing this exception while running the step COPY scripts/docker scripts/docker --> "COPY failed: stat /var/lib/docker/tmp/docker-builder125807076/scripts/docker: no such file or directory". Whether i have to clone the git repo and then have to use docker build command?
Yes. That's the whole point - customisation only works if you have sources of Airflow checked out.
I think we should get this one in sooner before 2.0.0rc1, is someone willing to work on this one??
Also, I don't think docker-compose files need to be production-ready. It should just be meant for local-development or to quickly start / work on Airflow locally with different executors
Also, I don't think docker-compose files need to be production-ready. It should just be meant for local-development or to quickly start / work on Airflow locally with different executors
Agree. Starting small is good.
@potiuk should we move milestone to 2.1 for this?
Yep. Just did :).
My docker compose:
version: '3'
x-airflow-common:
&airflow-common
image: apache/airflow:1.10.12
environment:
- AIRFLOW__CORE__EXECUTOR=CeleryExecutor
- AIRFLOW__CORE__SQL_ALCHEMY_CONN=mysql://root@mysql/airflow?charset=utf8mb4
- AIRFLOW__CORE__SQL_ENGINE_COLLATION_FOR_IDS=utf8mb3_general_ci
- AIRFLOW__CELERY__BROKER_URL=redis://:@redis:6379/0
- AIRFLOW__CELERY__RESULT_BACKEND=redis://:@redis:6379/0
- AIRFLOW__CORE__FERNET_KEY=FB0o_zt4e3Ziq3LdUUO7F2Z95cvFFx16hU8jTeR1ASM=
- AIRFLOW__CORE__LOAD_EXAMPLES=False
- AIRFLOW__CORE__LOGGING_LEVEL=Debug
- AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION=False
- AIRFLOW__WEBSERVER__RBAC=True
- AIRFLOW__CORE__STORE_SERIALIZED_DAGS=True
- AIRFLOW__CORE__STORE_DAG_CODE=True
volumes:
- ./dags:/opt/airflow/dags
- ./airflow-data/logs:/opt/airflow/logs
- ./airflow-data/plugins:/opt/airflow/plugins
depends_on:
- redis
- mysql
services:
mysql:
image: mysql:5.7
environment:
- MYSQL_ALLOW_EMPTY_PASSWORD=true
- MYSQL_ROOT_HOST=%
- MYSQL_DATABASE=airflow
volumes:
- ./mysql/conf.d:/etc/mysql/conf.d:ro
- /dev/urandom:/dev/random # Required to get non-blocking entropy source
- ./airflow-data/mysql-db-volume:/var/lib/mysql
ports:
- "3306:3306"
command:
- mysqld
- --character-set-server=utf8mb4
- --collation-server=utf8mb4_unicode_ci
redis:
image: redis:latest
ports:
- 6379:6379
flower:
<< : *airflow-common
command: flower
ports:
- 5555:5555
airflow-init:
<< : *airflow-common
container_name: airflow_init
entrypoint: /bin/bash
command:
- -c
- airflow list_users || (
airflow initdb &&
airflow create_user
--role Admin
--username airflow
--password airflow
--email airflow@airflow.com
--firstname airflow
--lastname airflow
)
restart: on-failure
airflow-webserver:
<< : *airflow-common
command: webserver
ports:
- 8080:8080
restart: always
airflow-scheduler:
<< : *airflow-common
container_name: airflow_scheduler
command:
- scheduler
- --run-duration
- '30'
restart: always
airflow-worker:
<< : *airflow-common
container_name: airflow_worker1
command: worker
restart: always
@BasPH shared on Slack: one-line command to start Airflow in docker:
In case you’ve ever wondered how to get the Airflow image to work in a one-liner (for demo purposes), here’s how:
docker run -ti -p 8080:8080 -v yourdag.py:/opt/airflow/dags/yourdag.py --entrypoint=/bin/bash apache/airflow:2.0.0b3-python3.8 -c '(airflow db init && airflow users create --username admin --password admin --firstname Anonymous --lastname Admin --role Admin --email admin@example.org); airflow webserver & airflow scheduler'
Creates a user
admin
/admin
and runs a SQLite metastore in the container
https://apache-airflow.slack.com/archives/CQAMHKWSJ/p1608152276070500
I have prepared some Dockerfiles with some common configuration.
I added health checks where it was simple. Anyone have an idea for health-checks for airflow-scheduler
/airflow-worker
? This will improve stability.
Besides, I am planning to prepare a tool that is used to generate docker-compose files using a simple wizard. I am thinking of something similar to the Pytorch project. https://pytorch.org/get-started/locally/
Besides, I am planning to prepare a tool that is used to generate docker-compose files using a simple wizard. I am thinking of something similar to the Pytorch project.
Very good idea! ❤️
Has anyone successfully gotten turbodbc installed using pip? I have had to install miniconda and use conda-forge to get turbodbc + pyarrow working correctly. This adds a little complication to my Dockerfile, although I do kind of like the conda-env.yml file approach.
@mik-laj wow, I knew I could use common environment variables but I had no idea you could also do the volumes and images, that is super clean. Any reason why you have the scheduler restart every 30 seconds like that?
Thank you all for the docker-compose
files :)
I'm sharing mine as it addresses some aspects that I couldn't find in this thread and had me spend some time on it to get it to work. These are:
git-sync
(This one is optional but is quite convienent).@mik-laj I also have a working healthcheck on the scheduler. Not the most expressive but works.
This configuration relies on an existing and initialized database.
External database - LocalExecutor - Airflow 2.0.0 - Traefik - Dags mostly based on DockerOperator.
version: "3.7"
x-airflow-environment: &airflow-environment
AIRFLOW__CORE__EXECUTOR: LocalExecutor
AIRFLOW__CORE__LOAD_EXAMPLES: "False"
AIRFLOW__CORE__LOAD_DEFAULT_CONNECTIONS: "False"
AIRFLOW__CORE__SQL_ALCHEMY_CONN: ${DB_CONNECTION_STRING}
AIRFLOW__CORE__FERNET_KEY: ${ENCRYPTION_KEY}
AIRFLOW__CORE__DAGS_FOLDER: /opt/airflow/sync/git/dags
AIRFLOW__CORE__ENABLE_XCOM_PICKLING: "True" # because of https://github.com/apache/airflow/issues/13487
AIRFLOW__WEBSERVER__BASE_URL: https://airflow.example.com
AIRFLOW__WEBSERVER__ENABLE_PROXY_FIX: "True"
AIRFLOW__WEBSERVER__RBAC: "True"
services:
traefik:
image: traefik:v2.4
container_name: traefik
command:
- --ping=true
- --providers.docker=true
- --providers.docker.exposedbydefault=false
- --entrypoints.web.address=:80
- --entrypoints.websecure.address=:443
# HTTP -> HTTPS redirect
- --entrypoints.web.http.redirections.entrypoint.to=websecure
- --entrypoints.web.http.redirections.entrypoint.scheme=https
# TLS config
- --certificatesresolvers.myresolver.acme.dnschallenge=true
- --certificatesresolvers.myresolver.acme.storage=/letsencrypt/acme.json
## Comment following line for a production deployment
- --certificatesresolvers.myresolver.acme.caserver=https://acme-staging-v02.api.letsencrypt.org/directory
## See https://doc.traefik.io/traefik/https/acme/#providers for other providers
- --certificatesresolvers.myresolver.acme.dnschallenge.provider=digitalocean
- --certificatesresolvers.myresolver.acme.email=user@example.com
ports:
- 80:80
- 443:443
environment:
# See https://doc.traefik.io/traefik/https/acme/#providers for other providers
DO_AUTH_TOKEN:
restart: always
healthcheck:
test: ["CMD", "traefik", "healthcheck", "--ping"]
interval: 10s
timeout: 10s
retries: 5
volumes:
- certs:/letsencrypt
- /var/run/docker.sock:/var/run/docker.sock:ro
# Required because of DockerOperator. For secure access and handling permissions.
docker-socket-proxy:
image: tecnativa/docker-socket-proxy:0.1.1
environment:
CONTAINERS: 1
IMAGES: 1
AUTH: 1
POST: 1
privileged: true
volumes:
- /var/run/docker.sock:/var/run/docker.sock:ro
restart: always
# Allows to deploy Dags on pushes to master
git-sync:
image: k8s.gcr.io/git-sync/git-sync:v3.2.2
container_name: dags-sync
environment:
GIT_SYNC_USERNAME:
GIT_SYNC_PASSWORD:
GIT_SYNC_REPO: https://example.com/my/repo.git
GIT_SYNC_DEST: dags
GIT_SYNC_BRANCH: master
GIT_SYNC_WAIT: 60
volumes:
- dags:/tmp:rw
restart: always
webserver:
image: apache/airflow:2.0.0
container_name: airflow_webserver
environment:
<<: *airflow-environment
command: webserver
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
volumes:
- dags:/opt/airflow/sync
- logs:/opt/airflow/logs
depends_on:
- git-sync
- traefik
labels:
- traefik.enable=true
- traefik.http.routers.webserver.rule=Host(`airflow.example.com`)
- traefik.http.routers.webserver.entrypoints=websecure
- traefik.http.routers.webserver.tls.certresolver=myresolver
- traefik.http.services.webserver.loadbalancer.server.port=8080
scheduler:
image: apache/airflow:2.0.0
container_name: airflow_scheduler
environment:
<<: *airflow-environment
command: scheduler
restart: always
healthcheck:
test: ["CMD-SHELL", 'curl --silent http://airflow_webserver:8080/health | grep -A 1 scheduler | grep \"healthy\"']
interval: 10s
timeout: 10s
retries: 5
volumes:
- dags:/opt/airflow/sync
- logs:/opt/airflow/logs
depends_on:
- git-sync
- webserver
volumes:
dags:
logs:
certs:
I have an extra container (not shown) to handle rotating logs that are output directly to files. It is based on logrotate. Not sharing it here because it is a custom image and is beyond the scope of the thread. But if anybody interested, message me.
Hope it helps!
I added some improvements to the docker-compose file to make it more stable. https://github.com/apache/airflow/pull/14519 https://github.com/apache/airflow/pull/14522 Now we have health-checks for all components.
@mik-laj Can we close this one since we already added the docker-compose files?
@kaxil -> I believe so. I do not think 'production-ready" docker-compose is even a thing :)
Description
In order to use the production image we are already working on a helm chart, but we might want to add a production-ready docker compose that will be able to run airflow installation.
Use case / motivation
For local tests/small deployments - being able to have such docker-compose environment would be really nice.
We seem to get to consensus that we need to have several docker-compose "sets" of files:
They should be varianted and possible to specify the number of parameters:
Depending on the setup, those Docker compose file should do proper DB initialisation.
Example Docker Compose (From https://apache-airflow.slack.com/archives/CQAMHKWSJ/p1587748008106000) that we might use as a base and #8548 . This is just example so this issue will not implement all of it and we will likely split those docker-compose into separate postgres/sqlite/mysql similarly as we do in CI script, so I wanted to keep it as separate issue - we will deal with user creation in #8606
Another example from https://apache-airflow.slack.com/archives/CQAMHKWSJ/p1587679356095400:
Related issues The initial user creation #8606, #8548 Quick start documentation planned in #8542