Open Lee-W opened 3 months ago
The Rules
now is an example of how these changes can be recorded. I will check the existing breaking changes and update the rules. It would be great if folks could help update this list if you know there are breaking changes.
I pinned the issue - this way it will show up at the top of "Issues" list in the repo
Thanks!
We can just go over all the significant newsfragments and create a rule for them or keep some reasoning why it doesn't require one
We should add something for the public API change too. API v1 won't work anymore. Those are being changed as part of AIP-84 to a new FastApi based app. GitHub project for it: https://github.com/orgs/apache/projects/414
Issue here to regroup Rest API breaking changes https://github.com/apache/airflow/issues/43378
I have started prototyping a small package based on LibCST to build a Python 2to3 like tool for Airflow 2to3 that does simple and straight forward replacements. My main motivation was around lot of our users in our Airflow instance using schedule_interval
in Airflow 2 that was deprecated and renamed to schedule
in Airflow 3. It would require updating thousands of dags manually and some automation could help. This could also help in places with import statements changes .E.g. Task SDK need to be updated from from airflow import DAG
to from airflow.sdk import DAG
. Something like this could eventually become part of Airflow cli so that users can run airflow migrate /airflow/dags
for migration or serve as a starter point for migration. It can update the file in place or show diff. Currently it does the following changes :
Dags
Operators
Sample file
import datetime
from airflow import DAG
from airflow.decorators import dag, task
from airflow.operators.empty import EmptyOperator
from airflow.timetables.events import EventsTimetable
with DAG(
dag_id="my_dag_name",
default_view="tree",
start_date=datetime.datetime(2021, 1, 1),
schedule_interval="@daily",
concurrency=2,
):
op = EmptyOperator(
task_id="task", task_concurrency=1, trigger_rule="none_failed_or_skipped"
)
@dag(
default_view="graph",
start_date=datetime.datetime(2021, 1, 1),
schedule_interval=EventsTimetable(event_dates=[datetime.datetime(2022, 4, 5)]),
max_active_tasks=2,
full_filepath="/tmp/test_dag.py"
)
def my_decorated_dag():
op = EmptyOperator(task_id="task")
my_decorated_dag()
Sample usage
python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 tests/test_dag.py
Calculating full-repo metadata...
Executing codemod...
reformatted -
All done! ✨ 🍰 ✨
1 file reformatted.
--- /home/karthikeyan/stuff/python/libcst-tut/tests/test_dag.py
+++ /home/karthikeyan/stuff/python/libcst-tut/tests/test_dag.py
@@ -10,6 +10,6 @@
dag_id="my_dag_name",
- default_view="tree",
+ default_view="grid",
start_date=datetime.datetime(2021, 1, 1),
- schedule_interval="@daily",
- concurrency=2,
+ schedule="@daily",
+ max_active_tasks=2,
):
@@ -23,5 +23,4 @@
start_date=datetime.datetime(2021, 1, 1),
- schedule_interval=EventsTimetable(event_dates=[datetime.datetime(2022, 4, 5)]),
+ schedule=EventsTimetable(event_dates=[datetime.datetime(2022, 4, 5)]),
max_active_tasks=2,
- full_filepath="/tmp/test_dag.py"
)
Finished codemodding 1 files!
- Transformed 1 files successfully.
- Skipped 0 files.
- Failed to codemod 0 files.
- 0 warnings were generated.
NICE! @tirkarthi -> you should start a thread about it at devlist and propose adding it to the repo. The sooner we start working on it and let poeple test it, the better it will be. And we can already start adding not only the newsfragments but also rules to the migration tools (cc: @vikramkoka @kaxil ) - we can even think about keeping a database of old-way-dags and running such migration tool on them and letting airflow scheduler from Airflow 3 process them (and maybe even execute) as part of our CI. This would tremendously help with maintaining and updating such a tool if we will make it a part of our CI pipeline.
BTW. I like it a lot how simple it is with libCST - we previously used quite a bit more complex tool from Facebook that allowed to do refactoring at scale in parallell (https://github.com/facebookincubator/Bowler) , but it was rather brittle to develop rules for it and it had some weird problems and missing features. One thing that was vere useful - is that it had a nice "parallelism" features - which allowed to refactor 1000s of files in seconds (but also made it difficult to debug).
I think if we get it working with libCST - it will be way more generic and maintainable, also we can easily add parallelism later on when/if we see it is slow.
One small watchout though - such a tool should have a way to isolate rules - so that they are not in a single big method - some abstraction that will allow us to easily develop and selectively apply (or skip) different rules - see https://github.com/apache/airflow/tree/v1-10-test/airflow/upgrade where we have documentation and information about the upgrade check we've done in Airflow 1 -> 2 migration.
Also we have to discuss, whether it should be a separate repo or whether it should be in airflow's monorepo. Both have pros and cons - in 1.10 we chose to keep it 1.10 branch of airflow, because it imported some of the airflow code and it was easier, but we could likely create a new repo for it, add CI there and keep it there.
We even have this archived repo https://github.com/apache/airflow-upgrade-check which we never used and archived, we could re-open it. We also have https://pypi.org/project/apache-airflow-upgrade-check/ - package in PyPI - and we could release new upgrade check versions (2.* ?) with "apache-airflow>=2.11.0" as dependency.
All that should likely be discussed at devlist :)
Thanks @potiuk for the details. I will start a discussion on this at the devlist and continue there. Bowler looks interesting. Using libcst.tool
from cli parallelizes the process. Right now this needs python -m libcst.tool
to execute it as a codemod. Initially I had designed them as standalone Transformer for each category like (dag, operator) where the updated AST from one transformer can be passed to another. The codemod looked like a recommended abstraction for running it and changed it that way to later find cli accepts only one codemod at a time. I need to check how composable they are.
python -m libcst.tool codemod --help | grep -i -A 1 'jobs JOBS'
-j JOBS, --jobs JOBS Number of jobs to use when processing files. Defaults to number of cores
time python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 ~/airflow/dags > /dev/null 2>&1
python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 ~/airflow/dags >
6.95s user 0.61s system 410% cpu 1.843 total
# Single core
time python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 -j 1 ~/airflow/dags > /dev/null 2>&1
python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 -j 1 >
/dev/nul 4.66s user 0.38s system 99% cpu 5.035 total
# 4 core
python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 -j 4 ~/airflow/dags > /dev/null 2>&1
python -m libcst.tool codemod dag_fixer.DagFixerCommand -u 1 -j 4 >
/dev/nul 5.45s user 0.54s system 253% cpu 2.358 total
Bowler looks interesting.
Don't be deceived by it :).
It was helpful for Provider's migration at some point in time, but I had many rough edges - like debugging a problem was a nightmare until we learned how to do it properly, also it had some annoying limitations - you had to learn a completely new non-standard abstractions (an SQLAlchemy-like DSL to perform modifications) - which did not cover all the refactorings we wanted to do. We had to really dig-deep into the code an find some workarounds for things we wanted to do, when authors of Bowler have not thoght about them. And sometimes those were nasty workarounds.
query = (
Query(<paths to modify>)
.select_function("old_name")
.rename("new_name")
.diff(interactive=True)
)
Example that I remember above is that we could not rename some of the object types easily because it was not "foreseen" (can't remember exactly) - we had a few surprises there.
Also Bowler seems to be not maintained for > 3 years and it means that it's unlikely to handle some constructs even in 3.9+ Airflow.
What I like about libcst is that it is really "low-level" interface that you have to program in Python rather than in abstract DSL - similar to "ast". You write actual python code to perform what you want to perform rather than rely on incomplete abstractions, even if you have to copy&paste rename code between different "rules" (for example) (which you can then abstract away as 'common` util if you need, so no big deal).
BTW. Codemod .... is also 5 years not maintained. Not that it is disqualification - but they list python2
as their dependency ... so .....
I tried to use libcst in airflow as a tiny POC of this issue here https://github.com/apache/airflow/blob/5b7977a149492168688e6f013a7dcd4fe3561a49/scripts/ci/pre_commit/check_deferrable_default.py#L34. It mostly works great except for its speed. I was also thinking about whether to add these migrations thing info ruff airflow linter but not yet explore much on the rust/ruff side.
:eyes: :eyes: rust
project :) ...
Me :heart: it (but I doubt we want to invest in it as it might be difficult to maintain, unless we find quite a few committers who are somewhat ruff profficient to at least be able to review the code) . But it's tempting I must admit.
But to be honest - while I'd love to finally get a serious rust project, it's not worth it I think we are talking of one-time migration for even a 10.000 dags it will take at most single minutes and we can turn it maybe in under one minute with rust - so not a big gain for a lot of pain :) . Or at lest this is what my intuition tells me.
I think parallelism will do the job nicely. My intuition tells me (but this is just intuition and understanding on some limits ans speed of certain operation) - that we will get from multiple 10s of minutes (when running such migration sequentially) to single minutes when we allow to run migration in parallel using multiple processors and processes - even with Python and libcst. This task is really suitable for such parallelisation because each file is complete, independent task that can be run in complete isolation from all other tasks - so spawning multiple paralllel interpreters, ideally forking them right after all the imports and common code is loaded so that they use shared memory for those - this should do the job nicely (at least intuitively).
Using RUST for that might be classic premature optimisation - we might likely not need it :). But would be worth to make some calculations and get some "numbers" for big installation - i.e. how many dags of what size are out there, and how long it will be to parse them all with libcst and write back (even unmodified or with a simple modification). I presume that parsing and writing back will be the bulk of the job - and modifications will add very little overhead as they will be mostly operating on in memory data structures.
Me ❤️ it (but I doubt we want to invest in it as it might be difficult to maintain, unless we find quite a few committers who are somewhat ruff profficient to at least be able to review the code) . But it's tempting I must admit.
But to be honest - while I'd love to finally get a serious rust project, it's not worth it I think we are talking of one-time migration for even a 10.000 dags it will take at most single minutes and we can turn it maybe in under one minute with rust - so not a big gain for a lot of pain :) . Or at lest this is what my intuition tells me.
Yep, totally agree. I just want to raise this idea which might be interesting. 👀
I presume that parsing and writing back will be the bulk of the job - and modifications will add very little overhead as they will be mostly operating on in memory data structures.
Yep, I think you're right. My previous default deferrable script took around 10 sec to process ~400 operators. Using ast for checking took around 1 sec
Mostly as curiosity: One option we might consider is https://github.com/alexpovel/srgn - I've heard about it recently, it's a "grep that understands code" with capabilities of running different actions. Written in rust, and allows to add extensions apparently where you can define your own "scopes" of search and modification.
But I am not too convinced - this is mostly a command line tool so we would have to have a sequence of "script commands" to run - seems that plugging in our own rules and AST parsing should also be more flexible, even if slower.
Mostly as curiosity: One option we might consider is https://github.com/alexpovel/srgn - I've heard about it recently, it's a "grep that understands code" with capabilities of running different actions. Written in rust, and allows to add extensions apparently where you can define your own "scopes" of search and modification.
But I am not too convinced - this is mostly a command line tool so we would have to have a sequence of "script commands" to run - seems that plugging in our own rules and AST parsing should also be more flexible, even if slower.
Yep, not that convinced either. but it is always good to have an alternative we could consider 🤔
My best idea right now is to split this into two tools. We don’t really want to invest too much time into building a very rich CLI tool to show users what need to be changed—we’ll effectively be rebuilding the error reporting interface in ruff (or flake8). Those squiggle lines, colors, error codes, and code context things are not easy to build.
It is probably easiest to tack the linter part on Ruff—it is Rust, but the code to implement a lint rule isn’t that hard if you know Python AST and just a general idea about C-like languages. The rewrite part is a lot more difficult, so it’s probably better to implement this as a different tool in Python with libcst. I’m thinking something like
$ ruff check --select AIR
This spits out lint errors with codes like AIR005 AIR123...
$ airflow2to3 --select AIR005 -- path/to/dag/file.py
This fixes the given error(s) in given file(s) in-place with a minimal CLI...
I plan to start experiementing some rules in Ruff to see how easy the first part actually is. We should be able to save a lot of effort if it is viable.
I tried to change the format a bit and list the rules in the following format.
* [ ] link to the pr with breaking change
* [ ] things to do
Once the things to do
have been listed, we can check the root pr. After implementing the rule, we can mark the things to do
as done.
I also updated the format for #41366, #41367, #41368, #41391, #41393
If anyone has anything to add but do not have permission to update the description. Please just tag me and I'll take a look
It is probably easiest to tack the linter part on Ruff—it is Rust, but the code to implement a lint rule isn’t that hard if you know Python AST and just a general idea about C-like languages. The rewrite part is a lot more difficult, so it’s probably better to implement this as a different tool in Python with libcst. I’m thinking something like
Actually I am convinced too - I quite like this one after a bit of thought. This is not something that might be maintained by a lot of people and a number of contributors, and even for them, this is so far from the main "airflow code" - it's really a "one-time" tool - that it might be worth treating it as our first "rust experiment". And I quite agree that, the AST code on it's own is not really that "pythonic" and if you know what you want, and have already existing examples, adding a new rule in RUST, should not be difficult even if you do not know it (and AI driven development here might be even pretty cool exercise). I'd myself be happy to add a few rules at some point of time and maybe even take part in implementing the tooling for rust for our CI environment.
The things we'll need to migrate for 41348
airflow.datasets
-> airflow.sdk.definitions.asset
DatasetAlias
-> AssetAlias
DatasetAll
-> AssetAll
DatasetAny
-> AssetAny
expand_alias_to_datasets
-> expand_alias_to_assets
DatasetAliasEvent
-> AssetAliasEvent
dest_dataset_uri
-> BaseAsset
BaseDataset
-> BaseAsset
iter_datasets
-> iter_assets
iter_dataset_aliases
-> iter_asset_aliases
Dataset
-> Asset
iter_datasets
-> iter_assets
iter_dataset_aliases
-> iter_asset_aliases
_DatasetBooleanCondition
-> _AssetBooleanCondition
iter_datasets
-> iter_assets
iter_dataset_aliases
-> iter_asset_aliases
airflow.datasets.manager
→ airflow.assets.manager
dataset_manager
→ asset_manager
resolve_dataset_manager
→ resolve_asset_manager
DatasetManager
→ AssetManager
register_dataset_change
→ register_asset_change
create_datasets
→ create_assets
register_dataset_change
→ notify_asset_created
notify_dataset_changed
→ notify_asset_changed
notify_dataset_alias_created
→ notify_asset_alias_created
airflow.listeners.spec.dataset
→ airflow.listeners.spec.asset
on_dataset_created
→ on_asset_created
on_dataset_changed
→ on_asset_changed
airflow.timetables.datasets
→ airflow.timetables.assets
DatasetOrTimeSchedule
→ AssetOrTimeSchedule
airflow.datasets.metadata
→ airflow.sdk.definitions.asset.metadata
airflow.listeners.spec.dataset
→ airflow.listeners.spec.asset
on_dataset_created
→ on_asset_created
on_dataset_changed
→ on_asset_changed
airflow.timetables.datasets.DatasetOrTimeSchedule
→ airflow.timetables.assets.AssetOrTimeSchedule
airflow.api_connexion.security.requires_access_dataset
→ airflow.api_connexion.security.requires_access_dataset.requires_access_asset
airflow.auth.managers.models.resource_details.DatasetDetails
→ `airflow.auth.managers.models.resource_details.AssetDetailsairflow.auth.managers.base_auth_manager.is_authorized_dataset
→ airflow.auth.managers.base_auth_manager.is_authorized_asset
airflow.timetables.simple.DatasetTriggeredTimetable
→ airflow.timetables.simple.AssetTriggeredTimetable
airflow.providers_manager.ProvidersManager
initialize_providers_dataset_uri_resources
→ initialize_providers_asset_uri_resources
dataset_factories
→ asset_factories
dataset_uri_handlers
→ asset_uri_handlers
dataset_to_openlineage_converters
→ asset_to_openlineage_converters
airflow.security.permissions.RESOURCE_DATASET
→ airflow.security.permissions.RESOURCE_ASSET
airflow.www.auth.has_access_dataset
→ airflow.www.auth.has_access_dataset.has_access_asset
airflow.lineage.hook.DatasetLineageInfo
→ airflow.lineage.hook.AssetLineageInfo
dataset
→ asset
airflow.lineage.hook.HookLineageCollector
create_dataset
→ create_asset
add_input_dataset
→ add_input_asset
add_output_dataset
→ add_output_asset
collected_datasets
→ collected_assets
triggering_dataset_events
→ triggering_asset_events
airflow.providers.amazon.aws.datasets
→ airflow.providers.amazon.aws.assets
s3
create_dataset
→ create_asset
convert_dataset_to_openlineage
→ convert_asset_to_openlineage
airflow.providers.amazon.auth_manager.avp.entities.AvpEntities.DATASET
→ airflow.providers.amazon.auth_manager.avp.entities.AvpEntities.ASSET
airflow.providers.amazon.auth_manager.aws_auth_manager.AwsAuthManager.is_authorized_dataset
→ airflow.providers.amazon.auth_manager.aws_auth_manager.AwsAuthManager.is_authorized_asset
dataset-uris
→ asset-uris
airflow.providers.common.io.datasets
→ airflow.providers.common.io.assets
file
create_dataset
→ create_asset
convert_dataset_to_openlineage
→ convert_asset_to_openlineage
dataset-uris
→ asset-uris
airflow.providers.fab.auth_manager.fab_auth_manager.is_authorized_dataset
→ airflow.providers.fab.auth_manager.fab_auth_manager.is_authorized_asset
airflow.providers.openlineage.utils.utils
DatasetInfo
→ AssetInfo
translate_airflow_dataset
→ translate_airflow_asset
airflow.providers.postgres.datasets
→ airflow.providers.postgres.assets
dataset-uris
→ asset-uris
airflow.providers.mysql.datasets
→ airflow.providers.mysql.assets
dataset-uris
→ asset-uris
airflow.providers.trino.datasets
→ airflow.providers.trino.assets
dataset-uris
→ asset-uris
airflow.api_connexion.schemas.dataset_schema
airflow.api_ui.views.datasets
airflow.serialization.pydantic.dataset
airflow.serialization.pydantic.taskinstance
airflow.serialization.enums.DagAttributeTypes
airflow.serialization.serialized_objects
airflow.utils.context
Hi all, I'm trying to read through the significant news fragment and compile a list of rules we should migrate. It would be nice if you could take a look and check if I missed anything.
@kaxil @ashb would also like to confirm whether we're still allowing users to use models in airflow 3.0? If not, should we just skip all the changes related to models. Thanks
I tried my hands on implementing a rule in Ruff. This one checks if a DAG uses the schedule
argument explicitly, and errors if there’s no such argument (i.e. user is relying on the implicit default, which changes in 3.0), or a deprecated argument is used.
Does this look reasonable enough for people to build on? I’ll produce a more detailed writeup of what to do if we feel this is the way to go.
@kaxil @ashb would also like to confirm whether we're still allowing users to use models in airflow 3.0? If not, should we just skip all the changes related to models. Thanks
Which models? But no, the plan is to not have/"allow" users to import anything from airflow.models at all. Exact details and new names are to be determined yet though
@kaxil https://github.com/apache/airflow/pull/41390 https://github.com/apache/airflow/pull/41393 https://github.com/apache/airflow/pull/41390
Duplicate entries for SubDAGs
@kaxil
41390
41393
41390
Duplicate entries for SubDAGs
oops, just fixed!
@kaxil @ashb would also like to confirm whether we're still allowing users to use models in airflow 3.0? If not, should we just skip all the changes related to models. Thanks
Which models? But no, the plan is to not have/"allow" users to import anything from airflow.models at all. Exact details and new names are to be determined yet though
Pretty much every model 👀 Sounds good. Just want to confirm I'm not misunderstanding anything. I'll just mark it as model change and not going to migrate for now till we have anything decided
@Lee-W There are few rules that we should add for Airflow configs too since we changed /removed them:
and some imports which won't work:
@Lee-W There are few rules that we should add for Airflow configs too since we changed /removed them:
* [Standardize timer metrics to milliseconds and remove config #43975](https://github.com/apache/airflow/pull/43975) * [Remove XCom pickling #43905](https://github.com/apache/airflow/pull/43905)
and some imports which won't work:
* [Remove deprecations from `airflow.executors` & `airflow.utils` #41395](https://github.com/apache/airflow/pull/41395) * [Remove deprecated `ExternalTaskSensorLink` #41391](https://github.com/apache/airflow/pull/41391) * [Remove the ability to import executors from plugins #43289](https://github.com/apache/airflow/pull/43289) * [Remove redundant functions in `airflow.utils.dates` #43533](https://github.com/apache/airflow/pull/43533) * [Remove deprecated Python Version identifiers #43562](https://github.com/apache/airflow/pull/43562) * [Remove deprecated functions from `airflow/configuration.py` #43530](https://github.com/apache/airflow/pull/43530)
Thanks for reminding me! I'm still halfway to completing reading all the PRs. Will continue work on updating the list
Awesome @Lee-W , I hadn't seen this issue, so great to see the progress here. Following up on the action item from the last dev call, I created this page on Confluence as a draft https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+3+breaking+changes
Awesome @Lee-W , I hadn't seen this issue, so great to see the progress here. Following up on the action item from the last dev call, I created this page on Confluence as a draft https://cwiki.apache.org/confluence/display/AIRFLOW/Airflow+3+breaking+changes
Looks great! I'll try to update both places
I also played around the fixer implementation a bit: https://github.com/uranusjr/airflow2to3
Still a lot of room for improvement, but I think it is a good starting point.
FWIW I also tried to create a Flake8 plugin for comparison, but got stuck resolving imports (we need to check whether a thing is actually from Airflow first to decide whether to emit errors for it). Flake8 does not seem to provide this out of the box (Ruff and LibCST each has an easy solution). I’m pretty sure there must be a solution, but the effort looking into this is probably not worthwhile unless we feel Ruff is too big a hurdle.
So what do people think about using Ruff for linting? This is somewhat important since we want to encourage community help to implement rules. If people don’t have strong opinions, I’ll start a lazy consensus thread in the mailing list to get a resolution.
Finally finishing checking all the significant before #44080 (except for the one @sunank200 's not working on).
I tried my best to list down the migration as possible, but it would be nice if folks can take another look to see if I miss anything. Thanks!
Thanks for preparing this Lee-W!
For https://github.com/apache/airflow/pull/43102, we can only detect the changes with Ruff
if users send requests to PATCH
API endpoints from the Airflow code.
@pierrejeambrun What do you think?
Thanks for preparing this Lee-W!
For #43102, we can only detect the changes with
Ruff
if users send requests toPATCH
API endpoints from the Airflow code. @pierrejeambrun What do you think?
But one of the questions is how we will detect it 🤔 there are plenty of ways to send a request to an API.
There is no straightforward way to do it. Indeed, it can be even a standalone application communicating with the Airflow API without other interactions. It could be even in different programming languages such as go
or scheduled/management bash
scripts in CI.
I am not sure #42042. I moved out one property from BaseUser
but I dont users are using this class directly.
There is no straightforward way to do it. Indeed, it can be even a standalone application communicating with the Airflow API without other interactions. It could be even in different programming languages such as go or scheduled/management bash scripts in CI.
I think we can only do it reasonably well if we assume the user uses Python Cliant of ours and then we should be able to say the users they could run their custom code through Ruff with the new python client installed to detect wrong parameters. Not sure if we need to have custom ruff rules for those changes, or maybe it's "mypy" kind of check for types ? I know astral works on a mypy
replacement as well, so there is a chance that we will get mypy checks from Astral befor we publish the tool (or we could use mypy for now if needed. Some quick check on new/old client with some test code for that might be useful.
For the rest of the clients, I think what we could also do - potentially - is to have a custom error code in fast API - where we wil handle errors generated when "known old-style requests" are issued to the new API and return pretty descriptive error " You are using old-style API ble, ble, ble ... you need to change your paremeters to .... ." - maybe we can generalize it in our API in the way to handle "typical" mistakes from multiple APIs by the same error handler?
I think we can only do it reasonably well if we assume the user uses Python Cliant of ours and then we should be able to say the users they could run their custom code through Ruff with the new python client installed to detect wrong parameters. Not sure if we need to have custom ruff rules for those changes, or maybe it's "mypy" kind of check for types ? I know astral works on a
mypy
replacement as well, so there is a chance that we will get mypy checks from Astral befor we publish the tool (or we could use mypy for now if needed. Some quick check on new/old client with some test code for that might be useful.
I agree, we should make this assumption and limit the check to a reasonable scope. I like the idea of restricting it so that only our Python Client will be affected. Otherwise, it could turn into a project of its own. :)
For the rest of the clients, I think what we could also do - potentially - is to have a custom error code in fast API - where we wil handle errors generated when "known old-style requests" are issued to the new API and return pretty descriptive error " You are using old-style API ble, ble, ble ... you need to change your paremeters to .... ." - maybe we can generalize it in our API in the way to handle "typical" mistakes from multiple APIs by the same error handler?
I was considering a similar approach, returning an error response if the old request is provided but I wasn’t entirely sure about the scope. If the goal is to catch these issues before upgrading the version, I am unsure how we can easily provide that. Simply reading the API changes documentation seems easier than creating a transition API and asking users to route calls through it to catch errors. Otherwise, such responses would indicate that their processes have already failed with an error. If the goal is also to warn users after upgrading the version, then this is the way to go for me too. I am just trying to better understand the scope of when and how we want to warn users.
I was considering a similar approach, returning an error response if the old request is provided but I wasn’t entirely sure about the scope.
Yeah. Maybe we can do something in Airlfow 2.11 ? Since our goal is that 2.11 should be the "bridge" release - maybe we could do a variation of my original proposal - see if the message coming is in "old" format and raise a deprecation and also manually implement "new" format there (that would likely require some manual modificaiton of the openapi specification there and some conditional code in 2.11 (if possible).
That could follow our pattern of "Make sure 2.11 raises no warnings and then migration to 3 should be smooth".
For the rest of the clients, I think what we could also do - potentially - is to have a custom error code in fast API - where we wil handle errors generated when "known old-style requests" are issued to the new API and return pretty descriptive error " You are using old-style API ble, ble, ble ... you need to change your paremeters to .... ." - maybe we can generalize it in our API in the way to handle "typical" mistakes from multiple APIs by the same error handler?
Indeed upgrading the client will automatically highlights type errors in users code.
For people bypassing the python client and making direct request to the API (or any other non-python system), we can indeed catch errors of such breaking change and return a clear message that this is not accepted anymore, and maybe even better give them the new way of how to achieve this. It's just more work but possible.
Otherwise reading the significant newsfragment for the RestAPI
would be a good start when migrating their API code.
For people bypassing the python client and making direct request to the API (or any other non-python system), we can indeed catch errors of such breaking change and return a clear message that this is not accepted anymore, and maybe even better give them the new way of how to achieve this. It's just more work but possible.
Just to repeat above - yes. I think it's more work, and I think it might require some nasty workarounds in FAST API that we will have to keep forever, but maybe we can do a 2.11-only change that will raise warnings if the old way is used instead (and allow to use new way) ? Not sure if possible and how many such breaking channges we will have, but it would really be nice to tell the users "if you have no warnings on 2.11, you are good to go".
I am not sure #42042. I moved out one property from
BaseUser
but I dont users are using this class directly.
After reading it again, I don't think users are using it either. 🤔 Then I'll just remove it. Thanks for checking it!
Description
Why
As we're introducing breaking changes to the main branch, it would be better to begin recording the things we could use migration tools to help our users migrate from Airflow 2 to 3.
The breaking changes can be found at https://github.com/apache/airflow/pulls?q=is%3Apr+label%3Aairflow3.0%3Abreaking.
What
List of significant news fragments and rules (by Nov 27)
allow_raw_html_descriptions
scheduler.processor_poll_interval
→scheduler.scheduler_idle_sleep_time
airflow.datasets
→airflow.sdk.definitions.asset
DatasetAlias
→AssetAlias
DatasetAll
→AssetAll
DatasetAny
→AssetAny
expand_alias_to_datasets
→expand_alias_to_assets
DatasetAliasEvent
→AssetAliasEvent
dest_dataset_uri
→BaseAsset
BaseDataset
→BaseAsset
iter_datasets
→iter_assets
iter_dataset_aliases
→iter_asset_aliases
Dataset
→Asset
iter_datasets
→iter_assets
iter_dataset_aliases
→iter_asset_aliases
_DatasetBooleanCondition
→_AssetBooleanCondition
iter_datasets
→iter_assets
iter_dataset_aliases
→iter_asset_aliases
airflow.datasets.manager
→airflow.assets.manager
dataset_manager
→asset_manager
resolve_dataset_manager
→resolve_asset_manager
DatasetManager
→AssetManager
register_dataset_change
→register_asset_change
create_datasets
→create_assets
register_dataset_change
→notify_asset_created
notify_dataset_changed
→notify_asset_changed
notify_dataset_alias_created
→notify_asset_alias_created
airflow.listeners.spec.dataset
→airflow.listeners.spec.asset
on_dataset_created
→on_asset_created
on_dataset_changed
→on_asset_changed
airflow.timetables.datasets
→airflow.timetables.assets
DatasetOrTimeSchedule
→AssetOrTimeSchedule
airflow.datasets.metadata
→airflow.sdk.definitions.asset.metadata
airflow.listeners.spec.dataset
→airflow.listeners.spec.asset
on_dataset_created
→on_asset_created
on_dataset_changed
→on_asset_changed
airflow.timetables.datasets.DatasetOrTimeSchedule
→airflow.timetables.assets.AssetOrTimeSchedule
airflow.api_connexion.security.requires_access_dataset
→airflow.api_connexion.security.requires_access_dataset.requires_access_asset
airflow.auth.managers.models.resource_details.DatasetDetails
→airflow.auth.managers.models.resource_details.AssetDetails
airflow.auth.managers.base_auth_manager.is_authorized_dataset
→airflow.auth.managers.base_auth_manager.is_authorized_asset
airflow.timetables.simple.DatasetTriggeredTimetable
→airflow.timetables.simple.AssetTriggeredTimetable
airflow.providers_manager.ProvidersManager
initialize_providers_dataset_uri_resources
→initialize_providers_asset_uri_resources
dataset_factories
→asset_factories
dataset_uri_handlers
→asset_uri_handlers
dataset_to_openlineage_converters
→asset_to_openlineage_converters
airflow.security.permissions.RESOURCE_DATASET
→airflow.security.permissions.RESOURCE_ASSET
airflow.www.auth.has_access_dataset
→airflow.www.auth.has_access_dataset.has_access_asset
airflow.lineage.hook.DatasetLineageInfo
→airflow.lineage.hook.AssetLineageInfo
dataset
→asset
airflow.lineage.hook.HookLineageCollector
create_dataset
→create_asset
add_input_dataset
→add_input_asset
add_output_dataset
→add_output_asset
collected_datasets
→collected_assets
triggering_dataset_events
→triggering_asset_events
dataset-uris
→asset-uris
(for providers amazon, common.io, mysql, fab, postgres, trino)airflow.providers.amazon.aws.datasets
→airflow.providers.amazon.aws.assets
s3
create_dataset
→create_asset
convert_dataset_to_openlineage
→convert_asset_to_openlineage
airflow.providers.amazon.auth_manager.avp.entities.AvpEntities.DATASET
→airflow.providers.amazon.auth_manager.avp.entities.AvpEntities.ASSET
airflow.providers.amazon.auth_manager.aws_auth_manager.AwsAuthManager.is_authorized_dataset
→airflow.providers.amazon.auth_manager.aws_auth_manager.AwsAuthManager.is_authorized_asset
airflow.providers.common.io.datasets
→airflow.providers.common.io.assets
file
create_dataset
→create_asset
convert_dataset_to_openlineage
→convert_asset_to_openlineage
airflow.providers.fab.auth_manager.fab_auth_manager.is_authorized_dataset
→airflow.providers.fab.auth_manager.fab_auth_manager.is_authorized_asset
airflow.providers.openlineage.utils.utils
DatasetInfo
→AssetInfo
translate_airflow_dataset
→translate_airflow_asset
airflow.providers.postgres.datasets
→airflow.providers.postgres.assets
airflow.providers.mysql.datasets
→airflow.providers.mysql.assets
airflow.providers.trino.datasets
→airflow.providers.trino.assets
airflow.api_connexion.schemas.dataset_schema
airflow.api_ui.views.datasets
airflow.serialization.pydantic.dataset
airflow.serialization.pydantic.taskinstance
airflow.serialization.enums.DagAttributeTypes
airflow.serialization.serialized_objects
airflow.utils.context
airflow.models.ImportError
→airflow.models.errors.ParseImportError
airflow.executors.*
airflow.hooks.*
airflow.macros.*
airflow.operators.*
airflow.sensors.*
airflow.operators.subdag
airflow.sensors.external_task.ExternalTaskSensorLink
→airflow.sensors.external_task.ExternalDagLin
DayOfWeekSensor
use_task_execution_day
→use_task_execution_day
airflow.models.taskMixin.TaskMixin
→airflow.models.taskMixin.DependencyMixin
airflow.executors.executor_loader.UNPICKLEABLE_EXECUTORS
airflow.utils.dag_cycle_tester.test_cycle
airflow.utils.file.TemporaryDirectory
airflow.utils.file.mkdirs
airflow.utils.state.SHUTDOWN
airflow.utils.state.terminating_states
DAG
schedule_interval
timetable
airflow.utils.dates.date_range
airflow.utils.dates.days_ago
→ ❓ do we need to change it topendulum.today('UTC').add(days=-N, ...)
airflow.utils.helpers.chain
→airflow.models.baseoperator.chain
airflow.utils.helpers.chain
→airflow.models.baseoperator.cross_downstream
airflow.secrets.local_filesystem.load_connections
→airflow.secrets.local_filesystem.load_connections_dict
airflow.secrets.local_filesystem.get_connection
→airflow.secrets.local_filesystem.load_connections_dict
smtp.smtp_user
smtp.smtp_password
webserver.session_lifetime_days
→ usewebserver.session_lifetime_minutes
webserver.force_log_out_after
→ usewebserver.session_lifetime_minutes
policy
→task_policy
airflow.utils.log.file_task_handler.FileTaskHandler
filename_template
airflow.utils.decorators.apply_defaults
scheduler.dependency_detector
airflow.secrets.base_secrets.BaseSecretsBackend.get_conn_uri
→airflow.secrets.base_secrets.BaseSecretsBackend.get_conn_value
airflow.secrets.base_secrets.BaseSecretsBackend.get_connections
→airflow.secrets.base_secrets.BaseSecretsBackend.get_connection
airflow.api.auth.backend.basic_auth
→airflow.providers.fab.auth_manager.api.auth.backend.basic_auth
airflow.api.auth.backend.kerberos_auth
→airflow.providers.fab.auth_manager.api.auth.backend.kerberos_auth
airflow.auth.managers.fab.api.auth.backend.kerberos_auth
→airflow.providers.fab.auth_manager.api.auth.backend.kerberos_auth
airflow.auth.managers.fab.fab_auth_manager
→airflow.providers.fab.auth_manager.security_manager.override
airflow.auth.managers.fab.security_manager.override
→airflow.providers.fab.auth_manager.security_manager.override
airflow.hooks.base.BaseHook.get_connections
(❓ related to 41642)airflow.kubernetes
airflow.operators.datetime.BranchDateTimeOperator
use_task_execution_day
→use_task_logical_date
airflow.operators.trigger_dagrun.TriggerDagRunOperator
parameter
→logical_date
airflow.operators.weekday.BranchDayOfWeekOperator
use_task_execution_day
→use_task_logical_date
airflow.triggers.external_task.TaskStateTrigger
airflow.hooks.dbapi
→airflow.providers.common.sql.hooks.sql
airflow.www.auth.has_access
→ useairflow.www.auth.has_access_*
airflow.www.security
→airflow.providers.fab.auth_manager.security_manager.override.FabAirflowSecurityManagerOverride
airflow.www.utils.get_sensitive_variables_fields
→airflow.utils.log.secrets_masker.get_sensitive_variables_fields
airflow.www.utils.should_hide_value_for_key
→airflow.utils.log.secrets_masker.should_hide_value_for_key
BaseOperator
task_concurrency
→max_active_tis_per_dag
https://github.com/astral-sh/ruff/pull/14616dummy
none_failed_or_skipped
operators.ALLOW_ILLEGAL_ARGUMENTS
airflow.models.baseoperator.BaseOperatorLink
→airflow.models.baseoperatorlink.BaseOperatorLink
airflow.models.connection.parse_netloc_to_hostname
airflow.models.connection.Connection.parse_from_uri
airflow.models.connection.Connection.log_info
airflow.models.connection.Connection.debug_info
airflow.api_connexion.security.requires_access
→ userequires_access_*
metrics.metrics_use_pattern_match
airflow.metrics.validators.AllowListValidator
→ suggest usingairflow.metrics.validators.PatternAllowListValidator
(not direct mapping)airflow.metrics.validators.BlockListValidator
→ suggest usingairflow.metrics.validators.PatternBlockListValidator
(not direct mapping)[ ] Remove property❌ Users are not likely to use itairflow.auth.managers.models.base_user.is_active
celery.stalled_task_timeout
kubernetes_executor.worker_pods_pending_timeout
→scheduler.task_queued_timeout
metrics.statsd_allow_list
→metrics.metrics_allow_list
metrics.statsd_block_list
→metrics.metrics_block_list
scheduler.statsd_on
→metrics.statsd_on
scheduler.statsd_host
→metrics.statsd_host
scheduler.statsd_port
→metrics.statsd_port
scheduler.statsd_prefix
→metrics.statsd_prefix
scheduler.statsd_allow_list
→metrics.statsd_allow_list
scheduler.stat_name_handler
→metrics.stat_name_handler
scheduler.statsd_datadog_enabled
→metrics.statsd_datadog_enabled
scheduler.statsd_datadog_tags
→metrics.statsd_datadog_tags
scheduler.statsd_datadog_metrics_tags
→metrics.statsd_datadog_metrics_tags
scheduler.statsd_custom_client_path
→metrics.statsd_custom_client_path
core.interleave_timestamp_parser
→logging.interleave_timestamp_parser
core.base_log_folder
→logging.base_log_folder
core.remote_logging
→logging.remote_logging
core.remote_log_conn_id
→logging.remote_log_conn_id
core.remote_base_log_folder
→logging.remote_base_log_folder
core.encrypt_s3_logs
→logging.encrypt_s3_logs
core.logging_level
→logging.logging_level
core.fab_logging_level
→logging.fab_logging_level
core.logging_config_class
→logging.logging_config_class
core.colored_console_log
→logging.colored_console_log
core.colored_log_format
→logging.colored_log_format
core.colored_formatter_class
→logging.colored_formatter_class
core.log_format
→logging.log_format
core.simple_log_format
→logging.simple_log_format
core.task_log_prefix_template
→logging.task_log_prefix_template
core.log_filename_template
→logging.log_filename_template
core.log_processor_filename_template
→logging.log_processor_filename_template
core.dag_processor_manager_log_location
→logging.dag_processor_manager_log_location
core.task_log_reader
→logging.task_log_reader
core.sql_alchemy_conn
→database.sql_alchemy_conn
core.sql_engine_encoding
→database.sql_engine_encoding
core.sql_engine_collation_for_ids
→database.sql_engine_collation_for_ids
core.sql_alchemy_pool_enabled
→database.sql_alchemy_pool_enabled
core.sql_alchemy_pool_size
→database.sql_alchemy_pool_size
core.sql_alchemy_max_overflow
→database.sql_alchemy_max_overflow
core.sql_alchemy_pool_recycle
→database.sql_alchemy_pool_recycle
core.sql_alchemy_pool_pre_ping
→database.sql_alchemy_pool_pre_ping
core.sql_alchemy_schema
→database.sql_alchemy_schema
core.sql_alchemy_connect_args
→database.sql_alchemy_connect_args
core.load_default_connections
→database.load_default_connections
core.max_db_retries
→database.max_db_retries
core.worker_precheck
→celery.worker_precheck
scheduler.max_threads
→scheduler.parsing_processes
celery.default_queue
→operators.default_queue
admin.hide_sensitive_variable_fields
→core.hide_sensitive_var_conn_fields
admin.sensitive_variable_fields
→core.sensitive_var_conn_names
core.non_pooled_task_slot_count
→core.default_pool_task_slot_count
core.dag_concurrency
→core.max_active_tasks_per_dag
api.access_control_allow_origin
→api.access_control_allow_origins
api.auth_backend
→api.auth_backends
scheduler.deactivate_stale_dags_interval
→scheduler.parsing_cleanup_interval
kubernetes_executor.worker_pods_pending_timeout_check_interval
→scheduler.task_queued_timeout_check_interval
webserver.update_fab_perms
→fab.update_fab_perms
webserver.auth_rate_limited
→fab.auth_rate_limited
webserver.auth_rate_limit
→fab.auth_rate_limit
kubernetes
→kubernetes_executor
core.check_slas
DAG
sla_miss_callback
BaseOperator
sla
dag_ignore_file_syntax
has changedDAG.max_active_runs
behavior has been changedairflow.api.auth.backend.default
→airflow.providers.fab.auth_manager.api.auth.backend.session
logging.enable_task_context_logger
airflow.executors.*
airflow.hook.*
trigger_rule=TriggerRule.ALWAYS
is blocked in a dynamic mapped taskairflow.config.get
→airflow.config.conf.get
airflow.config.getboolean
→airflow.config.conf.getboolean
airflow.config.getfloat
→airflow.config.conf.getfloat
airflow.config.getint
→airflow.config.conf.getint
airflow.config.has_option
→airflow.config.conf.has_option
airflow.config.remove_option
→airflow.config.conf.remove_option
airflow.config.as_dict
→airflow.config.conf.as_dict
airflow.config.set
→airflow.config.conf.set
airflow.config.
→airflow.config.conf.
airflow.utils.dates.parse_execution_date
airflow.utils.dates.round_time
airflow.utils.dates.scale_time_units
airflow.utils.dates.infer_time_unit
airflow.PY36
→if sys.version_info >== (3, 6)
airflow.PY37
→if sys.version_info >== (3, 7)
core.strict_dataset_uri_validation
traces.otel_task_log_event
metrics.timer_unit_consistency
Related issues
No response
Are you willing to submit a PR?
Code of Conduct