apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.34k stars 14.34k forks source link

Split providers into "regular" python sub-projects - each with own pyproject.toml #43304

Open potiuk opened 1 month ago

potiuk commented 1 month ago

Currently all providers after #42505 are all in a single "providers" project. This has gone through several teething problems (mainly connected to bugs in uv and ways how to integrate development environment with IDEs such as Pycharm and VSCode, but seems that those problems are largely solved now and we can possibly move to the next step where each provider will have it's own pyproject.toml with its own dependencies and the workspace setting of uv will allow us to resolve all those dependencies together and keep our setup with constraints, CI image that is used for CI worfklow and breeze and local development of providers with Breeze.

This has a number of changes to be implemented. Ideally each provider will have it's own complete "directory" where things are kept together:

The package building is currently done dynamically via breeze commands, where code is extracted and pacakge is prepared, also dependency information (including devel dependencies) is kept in provider.yaml. Ideally all the information that is needed to generate dependencies and build packages, should be moved to pyproject.toml and our breeze/CI automation should retrieve information from there, rather than provider.yaml.

This can be done in stages:

1) we could only move code and tests first - no docs or other files 2) we could do it provider-by-provider if we temporarily implement incremental change in our tooling to support both cases

Or it could be done via automated script that would convert all providers at once - this was earlier POC's implementing this approach (not nearly close to be complete - just testing viability of such approach) that could be used as base for the new solution.:

Script: https://github.com/apache/airflow/pull/28291 Result of runningn the script: https://github.com/apache/airflow/pull/28292

Bowrna commented 4 weeks ago

Let me check this one @potiuk. i will check the work involved in migrating a single provider by verifying with your PR and then see how far it can be automated.

potiuk commented 4 weeks ago

Let me check this one @potiuk. i will check the work involved in migrating a single provider by verifying with your PR and then see how far it can be automated.

This is a LOT of work - just be warned.

Likely many scripts, CI workflows, documentation, contributing docs, breeze will have to be changed. And it can be staged as mentioned above, but this also means that it will have to be made "more complex" for a while (to support both approaches and intermixing them) temporarily - until it becomes back simpler (and way simpler in places).

So that one is not for the faint of heart :D.

hardeybisey commented 6 days ago

@Bowrna is it okay if I work with you on this?

Bowrna commented 5 days ago

Sure @hardeybisey let me share my work in PR and we can do this together

hardeybisey commented 5 days ago

Thanks @Bowrna , I will wait for you to share the PR.