Closed potiuk closed 3 years ago
Thanks @potiuk I am really glad you added this to the installation instructions too!
FYI @vikramkoka (and @eladkal @paolaperaza) the "upgrade to newer dependencies" and "full tests needed" are special labels that can be added to PRs to change the scope of PR builds:
See: https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#step-4-prepare-pr
Maybe we need some special prefixes for those to distinguish from "regular" labels. If we decide to do that, we will have to update our workflows to handle the new names.
FYI @vikramkoka (and @eladkal @paolaperaza) the "upgrade to newer dependencies" and "full tests needed" are special labels that can be added to PRs to change the scope of PR builds:
See: https://github.com/apache/airflow/blob/master/CONTRIBUTING.rst#step-4-prepare-pr
- "upgrade to newer dependencies" causes an automated upgrade to latest dependencies using the "eager" upgrade strategy:
- "full tests needed" cause that full "matrix" of tests is run for our tests rather than one combination.
Maybe we need some special prefixes for those to distinguish from "regular" labels. If we decide to do that, we will have to update our workflows to handle the new names.
Sorry about that @potiuk . I did not know that. Will avoid using this "upgrade to new dependencies" label in the future.
Sorry about that @potiuk . I did not know that. Will avoid using this "upgrade to new dependencies" label in the future.
No problem :). that was mainly to explain what they are and probably ad them as exclusions in the description of the triage process.
I confirmed that one problem was solved.
Now it is possible to install .[google]
, but .[google, devel]
still doesn't work.
https://github.com/pypa/pip/pull/9241
I tried to install almost all extra packages with the above patch and it worked. I have the impression that when a new version of pip is released the problem will not occur or it will be marginal.
Extra | Status |
---|---|
amazon | SUCCESS |
apache.atlas | SUCCESS |
apache.beam | SUCCESS |
apache.cassandra | SUCCESS |
apache.druid | SUCCESS |
apache.hdfs | SUCCESS |
apache.kylin | SUCCESS |
apache.livy | SUCCESS |
apache.pig | SUCCESS |
apache.pinot | SUCCESS |
apache.spark | SUCCESS |
apache.sqoop | SUCCESS |
async | SUCCESS |
atlas | SUCCESS |
azure | SUCCESS |
cassandra | SUCCESS |
celery | SUCCESS |
cgroups | SUCCESS |
cloudant | SUCCESS |
cncf.kubernetes | SUCCESS |
crypto | SUCCESS |
dask | SUCCESS |
databricks | SUCCESS |
datadog | SUCCESS |
dingding | SUCCESS |
discord | SUCCESS |
doc | SUCCESS |
docker | SUCCESS |
druid | SUCCESS |
elasticsearch | SUCCESS |
exasol | SUCCESS |
SUCCESS | |
ftp | SUCCESS |
gcp | SUCCESS |
github_enterprise | SUCCESS |
SUCCESS | |
grpc | SUCCESS |
hashicorp | SUCCESS |
hdfs | SUCCESS |
http | SUCCESS |
imap | SUCCESS |
jdbc | SUCCESS |
jenkins | SUCCESS |
jira | SUCCESS |
kubernetes | SUCCESS |
ldap | SUCCESS |
microsoft.azure | SUCCESS |
microsoft.mssql | SUCCESS |
microsoft.winrm | SUCCESS |
mongo | SUCCESS |
mssql | SUCCESS |
openfaas | SUCCESS |
opsgenie | SUCCESS |
oracle | SUCCESS |
pagerduty | SUCCESS |
papermill | SUCCESS |
password | SUCCESS |
pinot | SUCCESS |
plexus | SUCCESS |
postgres | SUCCESS |
presto | SUCCESS |
qds | SUCCESS |
qubole | SUCCESS |
rabbitmq | SUCCESS |
redis | SUCCESS |
s3 | SUCCESS |
salesforce | SUCCESS |
samba | SUCCESS |
segment | SUCCESS |
sendgrid | SUCCESS |
sentry | SUCCESS |
sftp | SUCCESS |
singularity | SUCCESS |
slack | SUCCESS |
snowflake | SUCCESS |
spark | SUCCESS |
sqlite | SUCCESS |
ssh | SUCCESS |
statsd | SUCCESS |
tableau | SUCCESS |
vertica | SUCCESS |
virtualenv | SUCCESS |
winrm | SUCCESS |
yandex | SUCCESS |
zendesk | SUCCESS |
Pypi may have problems installing the master version because we have references to an unreleased package - apache-airflow-providers-telegram
.
I haven't tested the extras below. They may or may not work.
# all
# all_dbs
# aws
# devel
# devel_all
# devel_ci
# devel_hadoop
# gcp_api
# google_auth
apache.hive
apache.webhdfs
gcp
hive
kerberos
mysql
odbc
s3
telegram
webhdfs
Cool. Good job @mik-laj !. I will take a look tomorrow as well and try to run all the automation we run on CI. Until this gets released in 20.3.2 we still keep the warning in our docs but this looks very promising !
I updated the pip version to the newest master and trigger the build on my CI. Cross fingers. 🤞🏻 https://github.com/mik-laj/airflow/commit/4bb280b1e8218414f18b7a97442e1a86d9ea5ac6
I found the source of the problem. We have a conflicting constraints entry. https://github.com/apache/airflow/blob/fbd525ac1dfa06e0e3eb9ea6ce6013c08e4f0f1f/constraints-3.8.txt#L294
google-cloud-bigquery[bqstorage,pandas] 2.4.0 depends on pyarrow<3.0dev and >=1.0.0
Without this entry, the Airflow installation works. https://github.com/mik-laj/airflow/runs/1519148881?check_suite_focus=true https://github.com/mik-laj/airflow/commit/b573f2a493de82b083cc50e281f0b631a42b5c32
According to this PR, this entry is not needed. https://github.com/apache/airflow/pull/12683
This piece of code looks interesting to me. Should we also add a similar check to our project? https://github.com/apache/beam/blob/545db7386b69eb3c61690172c575dc025d91cca7/sdks/python/setup.py#L99-L107
Which version and which constraints you are comparing?
The constraints are automatically generated after installing the requirements using pip 20.2.4
Do you have any error reported by PIP check?
And which installation combination you are testing (which extras etc.?)
This piece of code looks interesting to me. Should we also add a similar check to our project? https://github.com/apache/beam/blob/545db7386b69eb3c61690172c575dc025d91cca7/sdks/python/setup.py#L99-L107
From what I know PIP does not show those warnings without --verbose commands (And then it is lost in a sea of other messages). But if you would like to test - it feel free. There are some hacks people are doing like guessing terminal and writing directly to it (similarly what I do with pre-commit) but this is rather dangerous.
When I run (pip 20.2.4)
pip install ".[devel_ci]" --constraint https://raw.githubusercontent.com/apache/airflow/constraints-master/constraints-3.6.txt
root@e57e08a8d4bf:/opt/airflow# pip check
No broken requirements found.
root@e57e08a8d4bf:/opt/airflow# pip freeze | grep pyarrow
pyarrow==0.17.1
No conflicts reported by pip.
If you are running those tests with 'apache-beam' then this is the reason we have conflict - but apache-beam
is explicitly excluded in all the tests we do because we know it has unsolvable dependency conflicts.
# Those packages are excluded because they break tests (downgrading mock) and they are
# not needed to run our test suite.
PACKAGES_EXCLUDED_FOR_CI = [
'apache-beam',
]
Which version and which constraints you are comparing?
I am building images on Github Action.
Here is the report: https://github.com/mik-laj/airflow/runs/1519002616?check_suite_focus=true
And which installation combination you are testing (which extras etc.?)
All extras required by CI.
If you are running those tests with 'apache-beam' then this is the reason we have conflict - but apache-beam is explicitly excluded in all the tests we do because we know it has unsolvable dependency conflicts.
Here is the error that I'm negotiating with. https://github.com/mik-laj/airflow/runs/1519504904
All my works are available on Github: https://github.com/apache/airflow/compare/master...mik-laj:master
If you are interested, you can view the history of changes and bugs.
So my hopes are much less now.
This looks like problem with new PIP (or a problem with old PIP that the new command reveals).
The same command with PIP 2.20.4 works flawlessly and shows no conflict.
Here you can see example run of the very same PIP install command: https://github.com/apache/airflow/runs/1519600008?check_suite_focus=true#step:9:1149
I will take much closer look later this week once we release 2.0rc, and once they hopefully release 20.3.2. But my hopes are much less than even an hour before.
Looks like the problem is that somehow the new resolver adds extras to our transitive dependencies where we did not ask for them (or maybe they were dropped by the previous resolver):
In 2.20.4 I do not see google-cloud-bigquery
has any extra (it is transitive dependency of pandas-gbq
) but in the new PIp somehow we have `google-cloud-bigquery[bqstorage,pandas]
Looks like the problem is that somehow the new resolver adds extras to our transitive dependencies where we did not ask for them (or maybe they were dropped by the previous resolver): In 2.20.4 I do not see
google-cloud-bigquery
has any extra (it is transitive dependency ofpandas-gbq
) but in the new PIp somehow we have `google-cloud-bigquery[bqstorage,pandas]
Is this related to this comment here: https://github.com/pypa/pip/issues/8785#issuecomment-678885871 ?
Note that there are 2 different issues being solved in the linked github thread, the one that is described in the comment (not solved yet) and the workaround using "--no-deps" (solved recently on master)
Is this related to this comment here: pypa/pip#8785 (comment) ?
Indeed. it looks like this might be the same root cause.
Ahoy! If you have been able to test against pip's current master branch, let me know if there's any outstanding issues. :)
Sorry - I've been busy identifiying and testing some issue with Airlow 2.0RC2 ( which lead to RC3 sent today). I will take a closer look tomorrow!
Well, pip's master branch is now 20.3.2, so... test against that! :)
@pradyunsg I'm not on the Airflow team and I don't have as deep of an understanding as @potiuk but I gave installing Airflow 1.10.14 with all dependencies using the new resolver with pip 20.3.2.
I'm not sure how much is Airflow fixes and how much is 20.3.2s improvements but I am able to successfully run pip install apache-airflow[all]
with no errors 😄. Thanks to both teams!
I also had the "dependency solving" session anda just discussed witth PIP team and experimented a bit and it seems we managed to pin-point the PIP 20.2.4 bug that generated bad pyarrow dependency. I updated it manually and seems that we are able to make it works for 2.0 as well. :crossed_fingers: for quick 20.3.3 release (20.3.2 was considered bad and removed in the meantime)
20.3.3 is out. I think it solves all the issues that broke Airflow's installation mechanisms. If @potiuk can confirm that, I'm guessing we can go ahead and close this. ;)
Yep. Confirmed it works for 2.0.
I need to do a few more checks and verify 1.10.14 as well and I will close that one,
Thanks A LOT @pradyunsg -> just in time for 2.0.0 of Airflow ;).
Hello, does the installation fail with the latest version of pip 21.0.1?
Hello, does the installation fail with the latest version of pip 21.0.1?
Can you please try and tell us ?
We did not have time to make some checks with it - it is likely that main problems have been solved already and improve upon.
Can you please try and tell us ?
We did not have time to make some checks with it - it is likely that main problems have been solved already and improve upon.
Hi @potiuk I ran this command pip install apache-airflow
. And it worked. My setup:
Is there any test that I can run to confirm proper installation?
Closing based on earlier comments:
I'm guessing we can go ahead and close this. ;)
Oh wait, this isn’t pypa/pip. Nvm me. I blame the lack of a breakfast. :P
Closing as a new version of pip has been released.
FWIW I tried pip install apache-airflow[all]
and it works correctly. Or rather, it correctly did not work due to #14994.
UPDATE 15.12.2020 6pm CET:
After releasing PIP 20.3.3 today we were able to make 2.0 compatible with the new PIP and 1.10.14 almost works (papermill extra is problematic when installing airflow using the new PIP). We will try to address it in case we release 1.10.15 but if you want to install papermill extra, please downgrade pip or use legacy resolver.
While with 2.0 it seems that airflow can be installed with new PIP following our recommended practice, in case you see any installation problem please report them as issues and downgrade to pip 20.2.4 as a workaround.
Thanks again to the PyPI team for the fast resolution (just in time for the 2.0 release).
We leave the issue open for a while but we updated the description and lowered the priority. We will close it once we have observed installations from our users after 2.0 is released and confirm that the problem is solved for our users.
UPDATE 15.12.2020 11am CET:
Seems that with the latest 20.3.3 release and fishing pyarrow dependency we are back in business with 2.0.0rc3.
Once we confirm it and verify 1.10.14 we will be able to close that one!
Thanks to the PYPI team for quick solving it.
I am adding this issue to keep track of the on-going problems with new PIP 20.3 released 30th of November.
There are multiple issues with the new PIP that makes it breaks with Airflow's dependency set.
The first blocking issue is https://github.com/pypa/pip/issues/9203 and https://github.com/pypa/pip/issues/9232.
The latest version of PIP @master is still not usable with Airflow:
Even when those is solved we already know we are affected by a few other problems:
We've raised the issue to the PIP team and they struggle with fixing a number of other teething problems.
We keep fingers crossed that they will manage to fix the issues promptly and that they will not be overwhelmed with putting out the fire.
There is no resolution yet, so for the time being downgrading PIP to
20.2.4
version is the best thing you can do.We raised the issue https://github.com/pypa/pip/issues/9231 with the proposal of change to PYPI to add an exclusion list to PyPI and we are waiting for their response.
UPDATE! Tested the current master version of PIP (which has been yesterday announced as candidate to 20.3.2) but it still does not solve installation problems with airflow:
Three new issues created: