apache / airflow

Apache Airflow - A platform to programmatically author, schedule, and monitor workflows
https://airflow.apache.org/
Apache License 2.0
36.84k stars 14.25k forks source link

PIP 20.3 might break Airflow installation #12838

Closed potiuk closed 3 years ago

potiuk commented 3 years ago

UPDATE 15.12.2020 6pm CET:

After releasing PIP 20.3.3 today we were able to make 2.0 compatible with the new PIP and 1.10.14 almost works (papermill extra is problematic when installing airflow using the new PIP). We will try to address it in case we release 1.10.15 but if you want to install papermill extra, please downgrade pip or use legacy resolver.

While with 2.0 it seems that airflow can be installed with new PIP following our recommended practice, in case you see any installation problem please report them as issues and downgrade to pip 20.2.4 as a workaround.

Thanks again to the PyPI team for the fast resolution (just in time for the 2.0 release).

We leave the issue open for a while but we updated the description and lowered the priority. We will close it once we have observed installations from our users after 2.0 is released and confirm that the problem is solved for our users.


UPDATE 15.12.2020 11am CET:

Seems that with the latest 20.3.3 release and fishing pyarrow dependency we are back in business with 2.0.0rc3.

Once we confirm it and verify 1.10.14 we will be able to close that one!

Thanks to the PYPI team for quick solving it.


I am adding this issue to keep track of the on-going problems with new PIP 20.3 released 30th of November.

There are multiple issues with the new PIP that makes it breaks with Airflow's dependency set.

The first blocking issue is https://github.com/pypa/pip/issues/9203 and https://github.com/pypa/pip/issues/9232.

The latest version of PIP @master is still not usable with Airflow:

Even when those is solved we already know we are affected by a few other problems:

We've raised the issue to the PIP team and they struggle with fixing a number of other teething problems.

We keep fingers crossed that they will manage to fix the issues promptly and that they will not be overwhelmed with putting out the fire.

There is no resolution yet, so for the time being downgrading PIP to 20.2.4 version is the best thing you can do.

pip install --upgrade pip==20.2.4

We raised the issue https://github.com/pypa/pip/issues/9231 with the proposal of change to PYPI to add an exclusion list to PyPI and we are waiting for their response.

UPDATE! Tested the current master version of PIP (which has been yesterday announced as candidate to 20.3.2) but it still does not solve installation problems with airflow:

Three new issues created:

vikramkoka commented 3 years ago

Thanks @potiuk I am really glad you added this to the installation instructions too!

potiuk commented 3 years ago

FYI @vikramkoka (and @eladkal @paolaperaza) the "upgrade to newer dependencies" and "full tests needed" are special labels that can be added to PRs to change the scope of PR builds:

See: https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#step-4-prepare-pr

Maybe we need some special prefixes for those to distinguish from "regular" labels. If we decide to do that, we will have to update our workflows to handle the new names.

vikramkoka commented 3 years ago

FYI @vikramkoka (and @eladkal @paolaperaza) the "upgrade to newer dependencies" and "full tests needed" are special labels that can be added to PRs to change the scope of PR builds:

See: https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#step-4-prepare-pr

  • "upgrade to newer dependencies" causes an automated upgrade to latest dependencies using the "eager" upgrade strategy:
  • "full tests needed" cause that full "matrix" of tests is run for our tests rather than one combination.

Maybe we need some special prefixes for those to distinguish from "regular" labels. If we decide to do that, we will have to update our workflows to handle the new names.

Sorry about that @potiuk . I did not know that. Will avoid using this "upgrade to new dependencies" label in the future.

potiuk commented 3 years ago

Sorry about that @potiuk . I did not know that. Will avoid using this "upgrade to new dependencies" label in the future.

No problem :). that was mainly to explain what they are and probably ad them as exclusions in the description of the triage process.

mik-laj commented 3 years ago

I confirmed that one problem was solved. Now it is possible to install .[google], but .[google, devel] still doesn't work. https://github.com/pypa/pip/pull/9241

mik-laj commented 3 years ago

I tried to install almost all extra packages with the above patch and it worked. I have the impression that when a new version of pip is released the problem will not occur or it will be marginal.

Extra Status
amazon SUCCESS
apache.atlas SUCCESS
apache.beam SUCCESS
apache.cassandra SUCCESS
apache.druid SUCCESS
apache.hdfs SUCCESS
apache.kylin SUCCESS
apache.livy SUCCESS
apache.pig SUCCESS
apache.pinot SUCCESS
apache.spark SUCCESS
apache.sqoop SUCCESS
async SUCCESS
atlas SUCCESS
azure SUCCESS
cassandra SUCCESS
celery SUCCESS
cgroups SUCCESS
cloudant SUCCESS
cncf.kubernetes SUCCESS
crypto SUCCESS
dask SUCCESS
databricks SUCCESS
datadog SUCCESS
dingding SUCCESS
discord SUCCESS
doc SUCCESS
docker SUCCESS
druid SUCCESS
elasticsearch SUCCESS
exasol SUCCESS
facebook SUCCESS
ftp SUCCESS
gcp SUCCESS
github_enterprise SUCCESS
google SUCCESS
grpc SUCCESS
hashicorp SUCCESS
hdfs SUCCESS
http SUCCESS
imap SUCCESS
jdbc SUCCESS
jenkins SUCCESS
jira SUCCESS
kubernetes SUCCESS
ldap SUCCESS
microsoft.azure SUCCESS
microsoft.mssql SUCCESS
microsoft.winrm SUCCESS
mongo SUCCESS
mssql SUCCESS
openfaas SUCCESS
opsgenie SUCCESS
oracle SUCCESS
pagerduty SUCCESS
papermill SUCCESS
password SUCCESS
pinot SUCCESS
plexus SUCCESS
postgres SUCCESS
presto SUCCESS
qds SUCCESS
qubole SUCCESS
rabbitmq SUCCESS
redis SUCCESS
s3 SUCCESS
salesforce SUCCESS
samba SUCCESS
segment SUCCESS
sendgrid SUCCESS
sentry SUCCESS
sftp SUCCESS
singularity SUCCESS
slack SUCCESS
snowflake SUCCESS
spark SUCCESS
sqlite SUCCESS
ssh SUCCESS
statsd SUCCESS
tableau SUCCESS
vertica SUCCESS
virtualenv SUCCESS
winrm SUCCESS
yandex SUCCESS
zendesk SUCCESS
mik-laj commented 3 years ago

Pypi may have problems installing the master version because we have references to an unreleased package - apache-airflow-providers-telegram.

I haven't tested the extras below. They may or may not work.

    # all
    # all_dbs
    # aws
    # devel
    # devel_all
    # devel_ci
    # devel_hadoop
    # gcp_api
    # google_auth
    apache.hive
    apache.webhdfs
    gcp
    hive
    kerberos
    mysql
    odbc
    s3
    telegram
    webhdfs
potiuk commented 3 years ago

Cool. Good job @mik-laj !. I will take a look tomorrow as well and try to run all the automation we run on CI. Until this gets released in 20.3.2 we still keep the warning in our docs but this looks very promising !

mik-laj commented 3 years ago

I updated the pip version to the newest master and trigger the build on my CI. Cross fingers. 🤞🏻 https://github.com/mik-laj/airflow/commit/4bb280b1e8218414f18b7a97442e1a86d9ea5ac6

mik-laj commented 3 years ago

I found the source of the problem. We have a conflicting constraints entry. https://github.com/apache/airflow/blob/fbd525ac1dfa06e0e3eb9ea6ce6013c08e4f0f1f/constraints-3.8.txt#L294

google-cloud-bigquery[bqstorage,pandas] 2.4.0 depends on pyarrow<3.0dev and >=1.0.0

Without this entry, the Airflow installation works. https://github.com/mik-laj/airflow/runs/1519148881?check_suite_focus=true https://github.com/mik-laj/airflow/commit/b573f2a493de82b083cc50e281f0b631a42b5c32

mik-laj commented 3 years ago

According to this PR, this entry is not needed. https://github.com/apache/airflow/pull/12683

mik-laj commented 3 years ago

This piece of code looks interesting to me. Should we also add a similar check to our project? https://github.com/apache/beam/blob/545db7386b69eb3c61690172c575dc025d91cca7/sdks/python/setup.py#L99-L107

potiuk commented 3 years ago

Which version and which constraints you are comparing?

The constraints are automatically generated after installing the requirements using pip 20.2.4

Do you have any error reported by PIP check?

And which installation combination you are testing (which extras etc.?)

potiuk commented 3 years ago

This piece of code looks interesting to me. Should we also add a similar check to our project? https://github.com/apache/beam/blob/545db7386b69eb3c61690172c575dc025d91cca7/sdks/python/setup.py#L99-L107

From what I know PIP does not show those warnings without --verbose commands (And then it is lost in a sea of other messages). But if you would like to test - it feel free. There are some hacks people are doing like guessing terminal and writing directly to it (similarly what I do with pre-commit) but this is rather dangerous.

potiuk commented 3 years ago

When I run (pip 20.2.4)

pip install ".[devel_ci]" --constraint https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt

root@e57e08a8d4bf:/opt/airflow# pip check
No broken requirements found.
root@e57e08a8d4bf:/opt/airflow# pip freeze  | grep pyarrow
pyarrow==0.17.1

No conflicts reported by pip.

potiuk commented 3 years ago

If you are running those tests with 'apache-beam' then this is the reason we have conflict - but apache-beam is explicitly excluded in all the tests we do because we know it has unsolvable dependency conflicts.

potiuk commented 3 years ago
# Those packages are excluded because they break tests (downgrading mock) and they are
# not needed to run our test suite.
PACKAGES_EXCLUDED_FOR_CI = [
    'apache-beam',
]
mik-laj commented 3 years ago

Which version and which constraints you are comparing?

I am building images on Github Action.

Here is the report: https://github.com/mik-laj/airflow/runs/1519002616?check_suite_focus=true

And which installation combination you are testing (which extras etc.?)

All extras required by CI.

mik-laj commented 3 years ago

If you are running those tests with 'apache-beam' then this is the reason we have conflict - but apache-beam is explicitly excluded in all the tests we do because we know it has unsolvable dependency conflicts.

Here is the error that I'm negotiating with. https://github.com/mik-laj/airflow/runs/1519504904

mik-laj commented 3 years ago

All my works are available on Github: https://github.com/apache/airflow/compare/master...mik-laj:master

If you are interested, you can view the history of changes and bugs.

potiuk commented 3 years ago

So my hopes are much less now.

This looks like problem with new PIP (or a problem with old PIP that the new command reveals).

The same command with PIP 2.20.4 works flawlessly and shows no conflict.

potiuk commented 3 years ago

Here you can see example run of the very same PIP install command: https://github.com/apache/airflow/runs/1519600008?check_suite_focus=true#step:9:1149

potiuk commented 3 years ago

I will take much closer look later this week once we release 2.0rc, and once they hopefully release 20.3.2. But my hopes are much less than even an hour before.

potiuk commented 3 years ago

Looks like the problem is that somehow the new resolver adds extras to our transitive dependencies where we did not ask for them (or maybe they were dropped by the previous resolver): In 2.20.4 I do not see google-cloud-bigquery has any extra (it is transitive dependency of pandas-gbq) but in the new PIp somehow we have `google-cloud-bigquery[bqstorage,pandas]

notatallshaw commented 3 years ago

Looks like the problem is that somehow the new resolver adds extras to our transitive dependencies where we did not ask for them (or maybe they were dropped by the previous resolver): In 2.20.4 I do not see google-cloud-bigquery has any extra (it is transitive dependency of pandas-gbq) but in the new PIp somehow we have `google-cloud-bigquery[bqstorage,pandas]

Is this related to this comment here: https://github.com/pypa/pip/issues/8785#issuecomment-678885871 ?

Note that there are 2 different issues being solved in the linked github thread, the one that is described in the comment (not solved yet) and the workaround using "--no-deps" (solved recently on master)

potiuk commented 3 years ago

Is this related to this comment here: pypa/pip#8785 (comment) ?

Indeed. it looks like this might be the same root cause.

pradyunsg commented 3 years ago

Ahoy! If you have been able to test against pip's current master branch, let me know if there's any outstanding issues. :)

potiuk commented 3 years ago

Sorry - I've been busy identifiying and testing some issue with Airlow 2.0RC2 ( which lead to RC3 sent today). I will take a closer look tomorrow!

pradyunsg commented 3 years ago

Well, pip's master branch is now 20.3.2, so... test against that! :)

notatallshaw commented 3 years ago

@pradyunsg I'm not on the Airflow team and I don't have as deep of an understanding as @potiuk but I gave installing Airflow 1.10.14 with all dependencies using the new resolver with pip 20.3.2.

I'm not sure how much is Airflow fixes and how much is 20.3.2s improvements but I am able to successfully run pip install apache-airflow[all] with no errors 😄. Thanks to both teams!

potiuk commented 3 years ago

I also had the "dependency solving" session anda just discussed witth PIP team and experimented a bit and it seems we managed to pin-point the PIP 20.2.4 bug that generated bad pyarrow dependency. I updated it manually and seems that we are able to make it works for 2.0 as well. :crossed_fingers: for quick 20.3.3 release (20.3.2 was considered bad and removed in the meantime)

pradyunsg commented 3 years ago

20.3.3 is out. I think it solves all the issues that broke Airflow's installation mechanisms. If @potiuk can confirm that, I'm guessing we can go ahead and close this. ;)

potiuk commented 3 years ago

Yep. Confirmed it works for 2.0.

I need to do a few more checks and verify 1.10.14 as well and I will close that one,

Thanks A LOT @pradyunsg -> just in time for 2.0.0 of Airflow ;).

kaustubhharapanahalli commented 3 years ago

Hello, does the installation fail with the latest version of pip 21.0.1?

potiuk commented 3 years ago

Hello, does the installation fail with the latest version of pip 21.0.1?

Can you please try and tell us ?

We did not have time to make some checks with it - it is likely that main problems have been solved already and improve upon.

kaustubhharapanahalli commented 3 years ago

Can you please try and tell us ?

We did not have time to make some checks with it - it is likely that main problems have been solved already and improve upon.

Hi @potiuk I ran this command pip install apache-airflow. And it worked. My setup:

Is there any test that I can run to confirm proper installation?

pradyunsg commented 3 years ago

Closing based on earlier comments:

I'm guessing we can go ahead and close this. ;)

pradyunsg commented 3 years ago

Oh wait, this isn’t pypa/pip. Nvm me. I blame the lack of a breakfast. :P

ashb commented 3 years ago

Closing as a new version of pip has been released.

uranusjr commented 3 years ago

FWIW I tried pip install apache-airflow[all] and it works correctly. Or rather, it correctly did not work due to #14994.