astronomer / astronomer-cosmos

Run your dbt Core projects as Apache Airflow DAGs and Task Groups with a few lines of code
https://astronomer.github.io/astronomer-cosmos/
Apache License 2.0
620 stars 157 forks source link

Fix broken CI #1180

Closed tatiana closed 1 month ago

tatiana commented 1 month ago

Recently our main branch checks stopped passing: https://github.com/astronomer/astronomer-cosmos/actions/runs/10528925346/job/29255782679

We managed to reproduce the issue locally by running:

hatch -v run tests.py3.9-2.7:type-check

The stacktrace is:

Traceback (most recent call last):
  File "/Users/tati/Library/Application Support/hatch/env/virtual/astronomer-cosmos/4VBJdS-x/tests.py3.9-2.7/lib/python3.9/site-packages/pip/_internal/cli/base_command.py", line 180, in exc_logging_wrapper
    status = run_func(*args)
  File "/Users/tati/Library/Application Support/hatch/env/virtual/astronomer-cosmos/4VBJdS-x/tests.py3.9-2.7/lib/python3.9/site-packages/pip/_internal/cli/req_command.py", line 245, in wrapper
    return func(self, options, args)
  File "/Users/tati/Library/Application Support/hatch/env/virtual/astronomer-cosmos/4VBJdS-x/tests.py3.9-2.7/lib/python3.9/site-packages/pip/_internal/commands/install.py", line 377, in run
    requirement_set = resolver.resolve(
  File "/Users/tati/Library/Application Support/hatch/env/virtual/astronomer-cosmos/4VBJdS-x/tests.py3.9-2.7/lib/python3.9/site-packages/pip/_internal/resolution/resolvelib/resolver.py", line 95, in resolve
    result = self._result = resolver.resolve(
  File "/Users/tati/Library/Application Support/hatch/env/virtual/astronomer-cosmos/4VBJdS-x/tests.py3.9-2.7/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
    state = resolution.resolve(requirements, max_rounds=max_rounds)
  File "/Users/tati/Library/Application Support/hatch/env/virtual/astronomer-cosmos/4VBJdS-x/tests.py3.9-2.7/lib/python3.9/site-packages/pip/_vendor/resolvelib/resolvers.py", line 457, in resolve
    raise ResolutionTooDeep(max_rounds)
pip._vendor.resolvelib.resolvers.ResolutionTooDeep: 200000

The issue is happening during the dependency resolution. Locally these are the logs that happened beforehand:

  Using cached apache_airflow_providers_amazon-8.24.0-py3-none-any.whl.metadata (10 kB)
  Using cached apache_airflow_providers_amazon-8.23.0-py3-none-any.whl.metadata (10 kB)
INFO: This is taking longer than usual. You might need to provide the dependency resolver with stricter constraints to reduce runtime. See https://pip.pypa.io/warnings/backtracking for guidance. If you want to abort this run, press Ctrl + C.
  Using cached apache_airflow_providers_amazon-8.22.0-py3-none-any.whl.metadata (10 kB)
INFO: pip is still looking at multiple versions of apache-airflow-providers-amazon[s3fs] to determine which version is compatible with other requirements. This could take a while.
  Using cached apache_airflow_providers_amazon-8.21.0-py3-none-any.whl.metadata (10 kB)
WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))': /simple/xmlsec/
Collecting xmlsec<1.3.14 (from apache-airflow-providers-amazon[s3fs]>=3.0.0)
  WARNING: Retrying (Retry(total=4, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ProtocolError('Connection aborted.', ConnectionResetError(54, 'Connection reset by peer'))': /packages/37/9f/342d4562eac99178d0d515c780285e107c6828cefad37d02f05b7b7d8751/xmlsec-1.3.13.tar.gz
  Downloading xmlsec-1.3.13.tar.gz (64 kB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 64.6/64.6 kB 416.4 kB/s eta 0:00:00
  Installing build dependencies: started
  Installing build dependencies: finished with status 'done'
  Getting requirements to build wheel: started
  Getting requirements to build wheel: finished with status 'done'
  Preparing metadata (pyproject.toml): started
  Preparing metadata (pyproject.toml): finished with status 'done'
Collecting apache-airflow-providers-amazon[s3fs]>=3.0.0
  Using cached apache_airflow_providers_amazon-8.20.0-py3-none-any.whl.metadata (10 kB)
tatiana commented 1 month ago

This may be a solution to the problem: https://github.com/astronomer/astronomer-cosmos/issues/967

@jbandoro did this in the past: https://github.com/astronomer/astronomer-cosmos/pull/812

But we need to understand why this was removed and perhaps revert

tatiana commented 1 month ago

A suggestion from @pankajkoti is that we may be able to remove some dependency causing this - to unblock other work - since we have other critical things to do in this sprint.

pankajkoti commented 1 month ago

It appears that all CI jobs using Airflow 2.7 are encountering deep resolution issues and failing. Notably, recent releases of Amazon, Google, and Azure providers have specified a minimum Airflow version of 2.8. I’m currently exploring potential connections between these findings and testing various combinations in PR #1182.

Interestingly, while Airflow versions 2.4, 2.5, 2.6, 2.8, and 2.9 are functioning correctly, only version 2.7 seems to be causing deep resolution problems.

pankajkoti commented 1 month ago

But we need to understand why this was removed and perhaps revert

Upon review, I found that the installation wasn’t removed but rather moved under the script scripts/test/pre-install-airflow.sh, which is now included in the pre-install-commands. This change was introduced in PR https://github.com/astronomer/astronomer-cosmos/pull/771/. Therefore, it seems that we are still installing Airflow with constraints.

pankajkoti commented 1 month ago

My finding so far is that resolutions are erroring out for Airflow 2.7 I changed the Airflow version for jobs using 2.7 to 2.8 and they finish up quicker. https://github.com/astronomer/astronomer-cosmos/actions/runs/10593607825/job/29355497612?pr=1182

Previously, I tried removing Amazon and Azure dependencies which I guessed to be taking time, but that din’t help. I’m continuing to investigate what is it that is causing Airflow 2.7 taking time and failing

pankajkoti commented 1 month ago

There’s not much luck here. Apparently, the airflow providers are conflicting each other trying to find common set.

I have tried various combinations locally by pinning, adding upper bounds and commenting some providers with not a complete success. The closest success I have got so far it this run https://github.com/astronomer/astronomer-cosmos/actions/runs/10605337919/job/29393925767?pr=1182 with the approach to pre-install the providers with Airflow 2.7 constraints. With this approach, it only fails for a combination of Python 3.7 & 3.8 for Airflow 2.7, and rest all are succeeding

I spent many hours yesterday but still clueless on what is the exact reason for this.

pankajkoti commented 1 month ago

We've merged the PR #1182 . Let's observe for a few PRs on how this solution acts up.

pankajkoti commented 1 month ago

Looks like the CI has been green for recent PR runs. So closing this ticket. We can re-open it in case we see failures again.