apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
37.12k stars 14.3k forks source link

Improve Test OpenAPI client tests #44020

Open potiuk opened 1 day ago

potiuk commented 1 day ago

We should move test open API client tests to be run inside breeze.

The Open API client tests are pretty brittle. They run in "runner" directly and they are not using breeze image as the base for running them - which means that they need to install airflow in the runner directly. This is a bit problematic in general case - because sometimes changes are introduced in providers that require "main" airflow to use new providers (for example when FAB gets incompatible changes or async IO is tested.

Then you need to not only install airflow but also build and install providers - you need to build the providers locally from sources and install some dependencies which might or might not cleanly install on "bare" runner environment

This has been somewhat mitigated by having a list of providers to install - but this is brittle, might change and some people find it "hard" to folllow - because they do not understand why sometimes they need to build and install those providers. Also what does not help is yaml keeping the scripts has some very unobvious problems where indentation might introduce unexpected end of lines etc.

When you attempt to build and install all providers, it's not easy sometimes and it will change over time as well - and when you use uv it tries to install and resolve all dependencies (including optional) so while it is way faster than pip it also tries to install and build some of the dependencies (like kerberos) that migh need some system-level tools to get installed.

Attempt to do so has been done in https://github.com/apache/airflow/pull/44007 and naive "build and install everything" ends up with:

Caused by: Failed to download and build `gssapi==1.9.0`
  Caused by: Build backend failed to determine requirements with `build_wheel()` (exit status: 1)

[stderr]
/bin/sh: 1: krb5-config: not found
Traceback (most recent call last):
  File "<string>", line 14, in <module>
  File "/home/runner/.cache/uv/builds-v0/.tmp6LD5ic/lib/python3.9/site-packages/setuptools/build_meta.py", line 334, in get_requires_for_build_wheel
    return self._get_build_requires(config_settings, requirements=[])
  File "/home/runner/.cache/uv/builds-v0/.tmp6LD5ic/lib/python3.9/site-packages/setuptools/build_meta.py", line 304, in _get_build_requires
    self.run_setup()
  File "/home/runner/.cache/uv/builds-v0/.tmp6LD5ic/lib/python3.9/site-packages/setuptools/build_meta.py", line 320, in run_setup
    exec(code, locals())
  File "<string>", line 109, in <module>
  File "<string>", line 22, in get_output
  File "/opt/hostedtoolcache/Python/3.9.20/x64/lib/python3.9/subprocess.py", line 424, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/opt/hostedtoolcache/Python/3.9.20/x64/lib/python3.9/subprocess.py", line 528, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'krb5-config --libs gssapi' returned non-zero exit status [127](https://github.com/apache/airflow/actions/runs/11829519567/job/32961957929#step:14:128).

That shows why we have CI image - because the CI image has all the necessary dependencies and base OS to install airflow + all dependencies and we keep it updated as airflow evolves and new dependencies are added.

Therefore it would be best is if the Test Open API client would be run in breeze (there we have pre-installed airflow with editable install for airflow, task_sdk and all providers so we only need to build and install python client).

This would require few things:

1) building python client outside of breeze 2) entering breeze 3) running the test script inside breeze

Also the whole TestOpenAPI client shoudl likely be moved to "special tests" from "basic tests" - becasause it has to wait for the CI image to be ready and Basic tests are run without waiting for the image.

potiuk commented 1 day ago

cc: @dstandish @kaxil @pierrejeambrun - you seemed to have some hard time with that test recently, here is a way how we can make it easier to maintain in the future.

potiuk commented 1 day ago

The interesting thing here is that uv is trying to install and buid krb5 because krb5 has only "sdist" package + (wait for it) binary wheels for macos : https://pypi.org/project/krb5/#files

So in order to just SEE by uv which dependencies krb5 has, it needs to be downloaded and converted to .whl package :exploding_head: ....

This one of the edge cases that make package resolution for Python packages so complex and brittle. It's a bit of miracle it works as well as it does regardless.

potiuk commented 1 day ago

Now ... This is why CI image of Airflow is so useful, because it has everything needed to build all 700 packages if necessary :D