Separate non-db-tests (and others) to providers/non-providers/task_sdk

potiuk commented 1 month ago

Currently non-db-tests execute all test type specified via --test-types in a single process using xdist (-n processors) parallelism in order to make efficient use of parallelisation of the tests. This means that both "core" and "providers" types of tests are run in a single pytest command. For example:

breeze testing non-db-tests \
      --parallel-test-types "API Providers[google]"

Where PARALLELL_TEST_TYPES are produced by the "selective check" to determine which tests should be run.

However the #42505 splits out providers to a separate directory and we cannot run providers and non-provider tests any longer in the same process, because pytest has a limitation where test module names should not overlap (and tests/conftest.py is different for providers and non-providers, so they cannot be run in the same process as it will result in

_pytest.pathlib.ImportPathMismatchError: ('tests.conftest', '/opt/airflow/tests/conftest.py', PosixPath('/opt/airflow/providers/tests/conftest.py'))

One solution to that (probably best approach) is to introduce a required flag (--providers/--no-providers) for non-db-tests and filter out / core provider test folders when respective flag is used, and run two separate jobs in our CI - one for providers and one for core.

That would require to have to separate jobs in ci.yml. "Non-db Core tests", "Non-db Providers tests" - each of them passing the flag down to run-unit-tests.yml composite workflow.

gopidesupavan commented 1 month ago

@potiuk I am interested in this should I take this on? Might take bit time to workout this, this area new to me , but this will help me to understand more on the airflow ci/cd . :)

potiuk commented 2 weeks ago

@potiuk I am interested in this should I take this on? Might take bit time to workout this, this area new to me , but this will help me to understand more on the airflow ci/cd . :)

It's going to be a little more complex it seems - we likely wil have to redesign a bit the way how tests are executed in CI with different test "groups" now so i will start a wider discussion on slack channel to discuss how to do it and involve others.

gopidesupavan commented 2 weeks ago

@potiuk I am interested in this should I take this on? Might take bit time to workout this, this area new to me , but this will help me to understand more on the airflow ci/cd . :)

It's going to be a little more complex it seems - we likely wil have to redesign a bit the way how tests are executed in CI with different test "groups" now so i will start a wider discussion on slack channel to discuss how to do it and involve others.

Absolutely! I just began analyzing this yesterday to see which components can be grouped or not. I noticed some existing options, like skip-provider-tests, and now that we have the task SDK, we may need to support this for executing tests or providing skipping options as well. I completely agree that some redesign is necessary.

potiuk commented 2 weeks ago

Hey Here. I would like to start a discussion and propose some redesign of our CI - to address some of the issues mentioned here - and involve others so that we come up together with a better design and then we get it implemented together.

First let me state the problems and what is wrong currently. And I will write a separte comment on what would be the first proposal to address it.

Before #42505 and #43319 all the tests we had were in "tests" folder. Now we have three test folders for unit tests: "tests", "providers/tests", "task_sdk/tests".

Also "system" tests are a separate type - but they are under "tests" in "airflow" and "provider' tests.

Those tests cannot be run together in single pytest command - because all of them have a conftest.py file at the top folder and pytest refuses to run them together because it does not know which conftest.py is the "top-level" one.
Currently we have "non-db-tests" that runs all the "non-db" tests together in a single xdist/parallel fashion and it currenlty runs together "tests" and "providers/tests" together - but this is really thanks to a workaround for it in "providers" (there is an __init__.py in "providers" folder that makes the test belong to package "providers.tests" - but this is really a hack because "providers" is where the whole "providers" project is, it's not a python "package".
Both "airflow" and "providers" tests are currently run with breeze testing tests command (which by default used to run ALL tests ("airflow + providers") and only when you specified particular packages or modules to run as additional arguments woudl run only those tests you specified. This was a bit "hacky" approach how it was done and one of the reasons why running system tests were broken after provider's separation (i temporarily fixed it in #43529 but this is really a band-aid).
Also the same breeze testing tests command is used to run "helm" unit tests - and there is another "hack" implemented - we are also using breeze container to run the tests but "helm" tests do not need to be run in breeze container at all - they need a small local venv we already have from k8s where helm is installed - and we can run them there
Actually airflow and task_sdk tests technically (eventually when we fully remove all provider references from airflow core and tests) will not have to use breeze container as well. With UV we could technicaly run the tests in a venv created dynamically in CI (with main constraints / uv lock for reproducibility) rather than using the CI image - because we do not need alll the 700+ deps - they should be "reproducible-enough" without the CI imaage of ours.
We alredy have separate "TaskSDK tests" that are running in the same environment as "Non-DB tests" - but only for "task_sdk" and they have separate breeze testing task-sdk-tests - it looks like a good direction,

potiuk commented 2 weeks ago

My proposal (high level) - it needs some detailed thinking and prototyping regarding the parallel execution of tests in CI is to follow the breeze testing task-sdk-tests and introduce three more testing commands:

breeze testing provider-tests command
breeze testing helm-tests command
breeze testint system-tests command

This means that we will loose some of the parallelism benefifts when we run provider tests and airflow tests together (but instead we will increase a number of jobs we have in CI, whch we still should be able to do without exceeding the limits that INFRA has on a number of parallel jobs we can run in CI jobs).

We can also gradually move tests to local venvs where they do not need database containers from docker compose and Pytthon / System dependencies that we currently standardise in the CI image.

Also System tests should have a 'dry-run" mode - where we can run them as part of the CI so that we know that they ar e not broken. We are not able to run system tests for all external systems, but at least we should be able to have some "dry-run" system tests that we should be able to run with all PRs.

So high level we have:

NOW

CI image bound tests:

Airflow + Providers DB tests
Airflow + Providers Non-DB Tests
TaskSDK Non-DB tests (no DB tests here)
Helm Unit local virtualenv tests

System tests are not currently executed.

And we can do it this way:

Stage 1

Likely we could attempt to make TaskSDK and Helm tests to run in venv rather than in CI image even now - I think Task SDK (@kaxil @ashb ?) there are very little dependencies there and we could easily make a local venv there instead of the whole CI image as there are no specific system or provider dependencies there.

Also spliting out tests to "Airflow" and "Providers" without moving them out of CI image will hel us to address the issues we have now with conftest.py and actual "separation" of the different test groups.

CI image bound tests:

Core DB tests
Core Non-DB Tests
Providers DB tests
Providers Non-DB tests
System tests dry-run tests

Local virtualenv tests

Helm Unit local virtualenv tests
TaskSDK Non-DB tests

Stage 2

When we remove all the remaining references to providers from Airflow we might move Airflow Non-DB tests to local virtualenv.

CI image bound tests:

Core DB tests
Providers DB tests
Providers Non-DB tests
Core System tests dry-run tests
Providers System tests dry-run tests

Local virtualenv tests:

Core Non-DB Tests
TaskSDK Non-DB tests (no DB tests here)
Helm Unit local virtualenv tests

Stage 3

We switch the providers tests to use task sdk - and at this stage all db tests should be converted to non-db tests.

CI image bound tests:

Providers Non-DB tests
System tests dry-run tests
Core DB tests

Local virtualenv tests:

Core Non-DB Tests
TaskSDK Non-DB tests (no DB tests here)
Helm Unit local virtualenv tests

We will still need provider tests to run in CI image and use the image to calculate constraints etc. - for quite some time (maybe for ever) but I see a clear path and stages we could take to simplify the "Airflow" and Task SDK ones.

Stage 4

When we add the capability of testcontainers (see #43514) the db tests might eventually be run in local virtualenv

CI image bound tests:

Providers Non-DB tests
System tests dry-run tests

Local virtualenv tests:

Core DB tests
Core Non-DB Tests
TaskSDK Non-DB tests (no DB tests here)
Helm Unit local virtualenv tests

We will still need provider tests to run in CI image and use the image to calculate constraints etc. - for quite some time (maybe for ever) but I see a clear path and stages we could take to simplify the "Airflow" and Task SDK ones.

potiuk commented 2 weeks ago

I made a few refinements to the initial proposal

ashb commented 2 weeks ago

I think Task SDK (@kaxil @ashb ?) there are very little dependencies there and we could easily make a local venv there instead of the whole CI image as there are no specific system or provider dependencies there.

Yes, there should be no deps on a any providers, not even the new standard provider as all of those need to depend on the SDK.

Given the ability of UV to create predictable venvs with the right versions installed, yeah I agree. Also if it's relevant I think every* test in the task SDK will be stateless and could be parallelized - i.e. they are all non-db style tests

Might not be 100% true, but my gut says it is

potiuk commented 2 weeks ago

Given the ability of UV to create predictable venvs with the right versions installed, yeah I agree. Also if it's relevant I think every* test in the task SDK will be stateless and could be parallelized - i.e. they are all non-db style tests

Yeah and starting to have it in a freshly created venv without any DB or whatever deps for Task SDK is actually even GOOD because there we will not have even accidental dependencies that we missed.

o-nikolas commented 1 week ago

Thoughts on your problem statement:

Now we have three test folders for unit tests: "tests", "providers/tests", "task_sdk/tests" Do we want to include helm_tests/ in that list to make it 4?

Also the same breeze testing tests command is used to run "helm" unit tests

Also system tests are run by the breeze testing tests command as of now which caused some confusion and issues recently. I think that's worth having a bullet point for in your explanation comment.

Thoughts on proposal:

and introduce two more testing commands:

Again, should we make system tests a top level command as well (breeze testing system-tests)? There wouldn't be much there now but I think it's a nice future proof solution. Because right now the system test stuff works in a very messy way, and I think making it a top level command would add much needed separation.

Otherwise I like the plan!

potiuk commented 1 week ago

Also system tests are run by the breeze testing tests command as of now which caused some confusion and issues recently. I think that's worth having a bullet point for in your explanation comment.

Agree. Will Update the plan accordingly.

potiuk commented 1 week ago

I updated it. I added something that was long time overdue - "dry run system tests". While we wil not be able to run all the system tests from Amazon/Google/Astronomer/Teradata/Microsoftt in the future, we should be able to have some "canonical" non-externally-bound system tests that we should run to "mimic" what Amazon/Google/Astronomer/Teradata/Microsoft in the future will be doing to make sure that at least "execution framework" works.

potiuk commented 1 week ago

I updated it following the discussion on slack - adding the stage where we get rid of all db and only use task sdk in all provider tests (stage 3)

gopidesupavan commented 6 days ago

@potiuk Going to attempt this one stage 1: Providers Non-DB tests over this weekend, would you have an suggestions or thoughts for this, at present am thinking just have similar setup like task_sdk as mentioned above it is good example direction?

gopidesupavan commented 6 days ago

I updated it following the discussion on slack - adding the stage where we get rid of all db and only use task sdk in all provider tests (stage 3)

yeah that make sense. :)

potiuk commented 3 days ago

@potiuk Going to attempt this one stage 1: Providers Non-DB tests over this weekend, would you have an suggestions or thoughts for this, at present am thinking just have similar setup like task_sdk as mentioned above it is good example direction?

Sorry - only see it now (Hackathon) I thought about having completely separate test- command for each test type.

breeze testing core-tests-db
breeze testing core-tests-non-db
breeze testing providers-tests-db
breeze testing providers-tests-non-db
breeze testing providers-task-sdk
breeze testing system

Then each of the tests will have it's own (specific for the test command) set of test types it can use - core tests will have all the "airflow sub-tests", providers will be able to use Providers or Providers[google] or Providers[-amazon,-google] as they can do today. Task-sdk will not have any type. System tests will have the type based on which provider system tests they are on.

Doing it this way has the nice effect that you can easily parallelise them as needed following our paralllel framework but also when you enter breeze with breeze command you should be able to run any test or group of tests by just pytest tests/* or pytest providers/tests/* (both DB and non-DB tests should also work when DB is available). Test type would only be used when you want to run a predefined "set" of tests from outside.

I actually have a draft version of this in my local branch, already and I wanted to make a PR today - so if you have not started on it, I might want to set it up for review for you :D

gopidesupavan commented 2 days ago

@potiuk Going to attempt this one stage 1: Providers Non-DB tests over this weekend, would you have an suggestions or thoughts for this, at present am thinking just have similar setup like task_sdk as mentioned above it is good example direction?

Sorry - only see it now (Hackathon) I thought about having completely separate test- command for each test type.

breeze testing core-tests-db

breeze testing core-tests-non-db

breeze testing providers-tests-db

breeze testing providers-tests-non-db

breeze testing providers-task-sdk

breeze testing system

Then each of the tests will have it's own (specific for the test command) set of test types it can use - core tests will have all the "airflow sub-tests", providers will be able to use Providers or Providers[google] or Providers[-amazon,-google] as they can do today. Task-sdk will not have any type. System tests will have the type based on which provider system tests they are on.

Doing it this way has the nice effect that you can easily parallelise them as needed following our paralllel framework but also when you enter breeze with breeze command you should be able to run any test or group of tests by just pytest tests/* or pytest providers/tests/* (both DB and non-DB tests should also work when DB is available). Test type would only be used when you want to run a predefined "set" of tests from outside.

I actually have a draft version of this in my local branch, already and I wanted to make a PR today - so if you have not started on it, I might want to set it up for review for you :D

No worries :). super thats great if you have it already, please go ahead, have not yet started, actually was looking over the weekend, backporting and testcontainer things.

Testcontainers is incredibly helpful for reducing the need for numerous Docker files and configurations. It’s straightforward to use (we can define some test kind of interfaces to extend or implement to any testconatainer in our tests) —I experimented with Kafka and Airflow, and while it’s not yet fully functional setup, I managed to get it to a point where it connects a Kafka Testcontainer and runs a DAG from my local setup. I think this is definitely worth using. will try this runnig with GHA to see how it performs :)

potiuk commented 2 days ago

Testcontainers is incredibly helpful for reducing the need for numerous Docker files and configurations. It’s straightforward to use (we can define some test kind of interfaces to extend or implement to any testconatainer in our tests) —I experimented with Kafka and Airflow, and while it’s not yet fully functional setup, I managed to get it to a point where it connects a Kafka Testcontainer and runs a DAG from my local setup. I think this is definitely worth using. will try this runnig with GHA to see how it performs :)

cool!

apache / airflow