apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.56k stars 14.16k forks source link

Conflicts with airflow constraints for airflow 2.2.0 python 3.7 #18932

Closed wpromatt closed 2 years ago

wpromatt commented 2 years ago

Apache Airflow version

2.2.0 (latest released)

Operating System

all

Versions of Apache Airflow Providers

No response

Deployment

Other

Deployment details

Python 3.7

What happened

The versions of flake8 and importlib-metadata specified in the constraints file are incompatible for python 3.7.

In the constraints file: we have importlib-metadata==4.8.1 and flake8==4.0.1.

flake8==4.0.1, however, requires <4.3: flake8 4.0.1 depends on importlib-metadata<4.3; python_version < "3.8"

The conflict is caused by:
    apache-airflow[amazon,async,celery,docker,google,grpc,hashicorp,http,postgres,redis,slack,ssh,statsd] 2.2.0 depends on importlib-metadata>=1.7; python_version < "3.9"
    flake8 4.0.1 depends on importlib-metadata<4.3; python_version < "3.8"
    The user requested (constraint) importlib-metadata==4.8.1

What you expected to happen

Installing dependencies using --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.2.0/constraints-3.7.txt should be possible without conflicts.

How to reproduce

python3.7

pip install 'apache-airflow[async,amazon,celery,postgres,google,slack,http,redis,statsd,docker,grpc,hashicorp,ssh]==2.2.0' flake8 --constraint https://raw.githubusercontent.com/apache/airflow/constraints-2.2.0/constraints-3.7.txt

output:

ERROR: Cannot install apache-airflow[amazon,async,celery,docker,google,grpc,hashicorp,http,postgres,redis,slack,ssh,statsd]==2.2.0 and flake8==4.0.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    apache-airflow[amazon,async,celery,docker,google,grpc,hashicorp,http,postgres,redis,slack,ssh,statsd] 2.2.0 depends on importlib-metadata>=1.7; python_version < "3.9"
    flake8 4.0.1 depends on importlib-metadata<4.3; python_version < "3.8"
    The user requested (constraint) importlib-metadata==4.8.1

To fix this you could try to:
1. loosen the range of package versions you've specified
2. remove package versions to allow pip attempt to solve the dependency conflict

Anything else

No response

Are you willing to submit PR?

Code of Conduct

potiuk commented 2 years ago

This is really weird. I noticed that importlib-metada was downgraded in some of the constraints but in some it was not and I do not know why - I iwll keep it open until I find out the reason. In the meantime I manually updated the constraints to get rid of the conflict - it should work now.

potiuk commented 2 years ago

Thanks for reporting it @wpromatt !

wpromatt commented 2 years ago

Thanks for handling the constraints so quickly!

bensta commented 2 years ago

Hi all, I get a similar issue when installing v. 2.1.4 with python 3.8:

pip install "apache-airflow[all]==2.1.4" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.1.4/constraints-3.8.txt"

Leads to

ERROR: Cannot install apache-airflow[all]==2.1.4 because these package versions have conflicting dependencies.
The conflict is caused by:
    apache-airflow[all] 2.1.4 depends on google-ads<8.0.0 and >=4.0.0; extra == "all"
    The user requested (constraint) google-ads==14.0.0

But this is not the only conflict between requirements.txt and the setup.py. Installing the "all_dbs" extras shows a different conflict:

Running pip install "apache-airflow[all_dbs]==2.1.4" --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.1.4/constraints-3.8.txt"

Yields:

The conflict is caused by:
    apache-airflow[all-dbs] 2.1.4 depends on mysql-connector-python<=8.0.22 and >=8.0.11; extra == "all_dbs"
    The user requested (constraint) mysql-connector-python==8.0.26
potiuk commented 2 years ago

Hmm. indeed, something wrong is there. I will take look more closely.

bensta commented 2 years ago

Quick update, in case you did not notice already: Same issue arises with the latest version, 2.2.1:

pip install "apache-airflow[all]"==2.2.1 --constraint "https://raw.githubusercontent.com/apache/airflow/constraints-2.2.1/constraints-3.8.txt"

ERROR: Cannot install apache-airflow[all]==2.2.1 because these package versions have conflicting dependencies.

The conflict is caused by:
    apache-airflow[all] 2.2.1 depends on azure-cosmos<5 and >=4.0.0; extra == "all"
    The user requested (constraint) azure-cosmos==3.2.0
potiuk commented 2 years ago

Hmm. I thought about it and I think it's pretty much expected behaviour (though we might simply want to remove the bundle extras from released airflow as they make very little sense there) .

You are not supposed to use the "all" and 'all_dbs" when you are installing airflow from PyPI. Those are development-only extras which work a bit differently than the "provider" extras.

I think the problem is different - we should simply remove them from the "installable" version of airflow in PyPI because having them there is simply misleading. I will do it as a follow up of this, when I am back at home (travelling now).

Airflow has several different types of extras (https://airflow.apache.org/docs/apache-airflow/stable/extra-packages-ref.html)

The problem is that unlike "provider" extras, the bundle extras contain "transitive" dependencies that were valid at the time of package relase. In "providers" the dependencies are transitive - from the actuallly installed providers. But those are often different than those in constraints. The constraints we generate include the dependencies of providers that were RELEASED at the time of preparing given version. In the meantime the dependencies could have changed in "main" and they could contain different dependencies than then go to "all" and "all_dbs". So in fact the "all" and "all_dbs" is really only useful when you are installing airlfow in "Development" mode from sources, not when you are installing airflow from PyPI.

You could check it yourself - if instead of [all_dbs] you specify [apache.cassandra, apache.drill, apache.druid,apache.hdfs,apache.hive,apache.pinot,cloudant,exasol,influxdb,microsoft.mssql,mongo,mysql,neo4j,postgres,presto,trino,vertica] - the installation should work just fine.

I think I will simply make sure to document it and clarify beheviour of bundle extras and I will remove the bundle releases from the next release of Airflow. The "bundle" release makes very little sense for PyPI installation.

WDYT?

bensta commented 2 years ago

Yep, your explanation does make sense. And indeed, if I use the extras individually as you suggested, the install does indeed work. Thanks for the clarification!

I have two thoughts from a user perspective: a) The documentation does imply, that the [all] and [all_dbs] bundle extras are indeed for production use, since the [all] bundle is described as "all _user _facing__ features. Together with the presence of a devel_all package does indeed imply that this is meant to be used by the end user.

b) Having bundles of extras is very convenient for end users, given the large number of extras there are. So if there would be a way to provide it to the end user without running into conflicts would be highly desirable.

potiuk commented 2 years ago

a) The documentation does imply, that the [all] and [all_dbs] bundle extras are indeed for production use, since the [all] bundle is described as "all user facing features. Together with the presence of a devel_all package does indeed imply that this is meant to be used by the end user.

Yeah. Doc update will be necessary if we remove them.

b) Having bundles of extras is very convenient for end users, given the large number of extras there are. So if there would be a way to provide it to the end user without running into conflicts would be highly desirable.

Good point. I think it can be achieved with a little update to our setup python code. I will definitely look to that.

elliotdes commented 2 years ago

@potiuk was there ever a solution for this? We are upgrading our MWAA instance from 2.0.2 to 2.2.2 and we are having conflicts between flake8 and importlib-metadata.

Using this constraints file

potiuk commented 2 years ago

I am not what you install and what you have in the images (and how you are installing dependencies).

You likely have completely different problem and you simply want to instal conflicting versions of requirements manually.

The problem described in this issue is only when "all" extra was used (which should never be used in production - it is development only setting). I am not sure at all even why you have problems with flake8 because it is a "devel" only dependency and you CERTAINLY should not install it in production.

If you need some help with - please open a GitHub Discussion or slack conversation and describe it in details (also please make sure to check describe what are the dependencies imposed by MWAAA - this is not something that we know of, and it might be hte confilciting dependencies of yours are coming with some conflicts there - but MWAA issues should be handled thriough MWAA support.

All the different scenarios that you can install and upgrade airflow when you are not using a managed version are described here : https://airflow.apache.org/docs/apache-airflow/stable/installation/installing-from-pypi.html#installation-and-upgrade-scenarios - if you do it differently or MWAAA imposes other limitations then you might generate some conflicts, but that's beyond the "generic airflow" domain.

potiuk commented 2 years ago

This has been documented in https://github.com/apache/airflow/pull/23697 in docs. closing.