apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
35.96k stars 13.97k forks source link

Allow disabling core components in helm chart #34495

Closed tokoko closed 1 day ago

tokoko commented 11 months ago

Description

Right now only optional components (triggerer, dag-processor) can be disabled from values.yaml in helm chart. I'd like to have the same functionality for core components as well (webserver, scheduler and workers even if Celery is configured).

Use case/motivation

I am trying to setup Airflow deployment for multiple teams. The way I hope to achieve it is too have multiple helm releases for the overall environment instead of just one.

This way teams will have flexibility to configure their Python environments freely and also have an isolated release cycle. The primary blocker for achieving this setup with official helm chart is the fact that most of the components can't be easily disabled from values.yaml

Related issues

No response

Are you willing to submit a PR?

Code of Conduct

hussein-awala commented 11 months ago

For the dag processor, we already discussed this multiple time in the past, and it looks like there is a preference for a single dag processor (with a single GitSync if you use it), but maybe we can re-discuss it if needed.

For the multiple Celery workers group, you can check #34219, it's WIP. IMHO it will be much better than creating and managing multiple releases.

IMHO some users may need deactivating the webserver if they want to manage Airflow completely with the CLI, and add an option to deactivate the scheduler/workers should not be a problem.

@jedcunningham WDYT?

tokoko commented 11 months ago

@hussein-awala thanks for the reply.

The setup that I was thinking of consists of various team-specific dockerfiles (for workers and dag-processors) that might be scattered across multiple git repos. I understand that managing multiple releases is less than ideal, but my thought process was that in case of a single release, cd process for a single repository after building it's own docker image will have to somehow reach out to some central helm deployer pipeline that is aware of other teams' deployments as well. Additionally, these deployments will have to be somehow queued so as not to coincide with one another. I suppose that's also doable, but seemed more awkward to me rather than an alternative where each repository's pipeline has a clear ownership of it's own deployments and might not be aware of other's existence at all.

hussein-awala commented 11 months ago

various team-specific dockerfiles (for workers and dag-processors)

For workers, the issue I mentioned should allow that

that might be scattered across multiple git repos

Currently the recommended way is creating a single repo with multiple git submodules, but yeah it's debatable.

Officially Airflow is single tenant, so I wonder if we should wait for AIP-54 before implementing the features you suggest. This AIP focuses on access management, but it will make Airflow a multi-tenant platform.

tokoko commented 1 day ago

Looks like the option to disable webserver/scheduler has already been added to the chart at some point. Closing this.