apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.26k stars 14.33k forks source link

Restrict pydantic 2.10.0 #44249

Closed gopidesupavan closed 12 hours ago

gopidesupavan commented 22 hours ago

Recent pydantic 2.10.0 release causing failures in CI. It seems there is ticket already opened here. https://github.com/pydantic/pydantic/issues/10910

Failing tests in CI https://github.com/apache/airflow/actions/runs/11954326679/job/33324739597#step:7:12365


^ Add meaningful description above Read the Pull Request Guidelines for more information. In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

sydney-runkle commented 22 hours ago

We should have a patch release out this afternoon with a fix for the issues you're encountering :)

ashb commented 21 hours ago

@sydney-runkle Do you plan on yanking the 2.10.0 release? (I don't mind either way, just need to know what we do)

If 2.10.0 gets yanked then we don't need to keep this/can revert it, otherwise we probably should keep the exclusion rule

sydney-runkle commented 20 hours ago

We're not going to yank v2.10.0, will just release a v2.10.1 patch soon!

sydney-runkle commented 20 hours ago

Perhaps you all could test against our main branch before we do our patch release to confirm that all is working as expected on your end?

potiuk commented 20 hours ago

Perhaps you all could test against our main branch before we do our patch release to confirm that all is working as expected on your end?

It would be easier if you have an rc or beta/alpha released in PyPI - because I am not sure if we can trigger the whole test suite against Github-installed version. Can it be done @sydney-runkle ?

sydney-runkle commented 20 hours ago

You can try with pydantic==git+https://github.com/pydantic/pydantic@main

potiuk commented 19 hours ago

You can try with pydantic==git+https://github.com/pydantic/pydantic@main

It might be that we expect version in our CI for that, so I am not sure if it's going to work . I can try, but I have a feeling it will fail in one of the steps.

gopidesupavan commented 19 hours ago

You can try with pydantic==git+https://github.com/pydantic/pydantic@main

It might be that we expect version in our CI for that, so I am not sure if it's going to work . I can try, but I have a feeling it will fail in one of the steps.

yeah i think it might fail at constraints ?

potiuk commented 19 hours ago

You can try with pydantic==git+https://github.com/pydantic/pydantic@main

It might be that we expect version in our CI for that, so I am not sure if it's going to work . I can try, but I have a feeling it will fail in one of the steps.

Trying it here @sydney-runkle https://github.com/apache/airflow/pull/44260

potiuk commented 19 hours ago

You can try with pydantic==git+https://github.com/pydantic/pydantic@main

It might be that we expect version in our CI for that, so I am not sure if it's going to work . I can try, but I have a feeling it will fail in one of the steps.

Trying it here @sydney-runkle #44260

Yeah... some tests are already failing here as I suspected. But those are auxiliary ones, maybe the main part of the tests that actually failed before will be ok:

  Preparing metadata (pyproject.toml): finished with status 'done'
ERROR: Requested apache-airflow==3.0.0.dev0 from file:///home/runner/work/airflow/airflow has invalid metadata: Expected end or semicolon (after name and no valid version specifier)
    pydantic==git+https://github.com/pydantic/pydantic@main
Viicos commented 18 hours ago

We're going to release 2.10.1 any minute now.

We got a report today of an issue happening on Python <3.10. The TaskInstancePydantic class is subclassing LoggingMixin. This class has an annotation for _log using the new 3.10+ union syntax. While Pydantic has always evaluated all type annotations of models including their bases, we were not using the correct globals and locals to so. Meaning with the following example:

a.py

from logging import Logger

class Base:
    _log: 'Logger | None'

b.py

from a import Base

from pydantic import BaseModel

class Model(BaseModel, Base):
    pass

Evalutating the annotation for _log would have resulted in a NameError, because Logger is imported in a.py, and we only used the globals of the b.py module until now. NameError's are ignored to allow for types to be defined later on. However, on 2.10, this now raises a TypeError because 'Logger | None' successfully evaluates (read: Logger can be resolved), but the union syntax is not supported on Python 3.9.

What's probably best is to make sure you're using the old syntax for every class that is going to be used in the context of a Pydantic model.

sydney-runkle commented 11 hours ago

Just released v2.10.1 with fixes for the issues you folks reported :)

potiuk commented 2 hours ago

What's probably best is to make sure you're using the old syntax for every class that is going to be used in the context of a Pydantic model.

As discussed in https://github.com/pydantic/pydantic/issues/10924#issuecomment-2493313194 - I think this change is breaking far too much of an existing code base and you are facing the reality that this will cause even more people to limit their Pydantic version to < 2 (it looks like the dbt team asked dbt-databricks maintainer to do so). While we might likely implement some fix to that (maybe limiting Pydantic to <2.10) in Airflow 2.10.4, this is not a good solution because Pydantic is so popular and used in many, many dependencies, and it's simply not reasonable to expect that existing released code will be updated. It might help if new versions of software are updated but I have a feeling even recent version of released software will start failing with similar issues. So I would seriously reconsider that "fix" and revert it or workaround it in 2.10.2 (at least that's what I'd do as a maintainer of such popular library).

Viicos commented 2 hours ago

I think this change is breaking far too much of an existing code base

Since the release of 2.10, we only had one report (https://github.com/pydantic/pydantic/issues/10924) related to evaluation of forward annotations.

Pydantic is so popular and used in many, many dependencies

We are aware of that, it is, in some cases, extremely challenging for us to make changes/cleanups, especially because in the field of forward annotations evaluation.

The standard library utilities are far from perfect, so we had to implement our own logic to support edge cases that unfortunately led to a messy implementation, accepting invalid annotations to evaluate (this can be dangerous as names in forward annotations could resolve to the wrong type) [^1].

it's simply not reasonable to expect that existing released code will be updated

Honestly, this requires a one line change on your end:

https://github.com/apache/airflow/blob/f33166a612218a10cee0e42cde36bc43fc184222/airflow/utils/log/logging_mixin.py#L68-L78

Your usage of type annotations is inconsistent here, as you are mixing the new union syntax with Optional. Note that if this class were to be defined in the same module as TaskInstancePydantic, the issue would have appeared already.

So I would seriously reconsider that "fix" and revert it or workaround it in 2.10.2

The "fix" is not actually a fix, but a complete refactor of our forward annotations evaluation. It's unfortunately not possible for us to "revert" it. The current implementation is vastly more accurate, and as I said above will avoid dangerous issues when forward annotations are silently not resolved to the correct type.

If we do get more reports, depending on their validity, we will then consider tweaking the current logic.

[^1]: Read more here.

Viicos commented 2 hours ago

For instance: https://github.com/langchain-ai/langchain-google/issues/610 is an example of a forward annotation evaluation that resolved to the wrong type in <2.10 (I'm in the process of writing up what's happening).

pierrejeambrun commented 1 hour ago

What's probably best is to make sure you're using the old syntax for every class that is going to be used in the context of a Pydantic model.

Indeed this is not great. We have all the codebase updated to the new style type annotation. (|, list, etc...) and having to use the old one everywhere pydantic is involved is not great. (maybe I didn't understand correctly what you are suggesting).

but the union syntax is not supported on Python 3.9.

We use from future import __annotation__ in conjunction with eval-type-backport and has been working nicely for us to solve pydantic new style annotation + forward evaluation in python 3.9. Is that expected or will we still face an issue with the new release ?

potiuk commented 1 hour ago

If we do get more reports, depending on their validity, we will then consider tweaking the current logic.

Sure. It is indeed up to you. We are pretty well protected - we have "constraint mechanism" and we can restric airlfow 2.10.4 - and we are rather happy to adjust future versions of airflow, I was just merely suggesting that this change might have more consequences for you - so It was more of a friendly advice than complaint.

But yes - I agree my assesment might be wrong.

Your usage of type annotations is inconsistent here, as you are mixing the new union syntax with Optional. Note that if this class were to be defined in the same module as TaskInstancePydantic, the issue would have appeared already.

Yes - but to be perfectly honest it's because of Pydantic. We have a rule in our pre-commit that enforces using new style type annotations. Because of inability of pydantic to handle it, we specifically converted our Pydantic classes to not use from future import __annotations__ and we even had to add exeption to our rule:

https://github.com/apache/airflow/blob/main/pyproject.toml#L326

So our "inconistent use" is exclusively because of Pydantic. And our use is perfectly fine according to current state of type annotations - both style of annotations can be inter-mixed and from __future__ import annoutattion as defined in https://peps.python.org/pep-0563/ . And while I know there are a lot of controversies around that PEP and the way how it works will likely change in the future, this is the current state of affairs, and we are following the accepted PEPs (while Pydantic does not really handle them properly and we had to introduce inconsistency in order to workaround it).

So technically speaking it's because of Pydantic we are inconsistent.

And I am not complaining and not trying to heat it up, I am just stating the fact - explaining that well, it's Pydantic who is the root cause of that inconsistency, not Airflow.

So really - @Viicos - please treat it as a friendly advice, that maybe it's a good idea to consider supporting that case. While we will handle it ourselves (luckily our constraint mechanism we introduce many years ago and educate our users to use them will prevent a lot of our users from being affected). It's really a maintainer-to-maintainer friendly suggestion that this might hit you back.