astral-sh / uv

An extremely fast Python package and project manager, written in Rust.
https://docs.astral.sh/uv
Apache License 2.0
21.2k stars 622 forks source link

The "standard" installation use `pyproject.toml` in UV rather than dynamic dependencies via build hooks (comparing to PIP) #2130

Closed potiuk closed 6 months ago

potiuk commented 6 months ago

When you install packages using remote url and specify extras, the --editable version of extras are used, rather than the dependencies used in wheel. While I don't think it's very well specified which dependencies should be used

uv pip install 'apache-airflow[aiobotocore,amazon,async,celery,cncf-kubernetes,common-io, \
docker,elasticsearch,ftp,google,google-auth,graphviz,grpc,hashicorp,http,ldap,microsoft-azure, \
mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake, \
ssh,statsd,uv,virtualenv] @ https://github.com/apache/airflow/archive/main.tar.gz'
Here is the output of `uv pip install` output > Resolved 359 packages in 2.94s > Downloaded 154 packages in 2.44s > Installed 257 packages in 700ms > + adal==1.2.7 > + adlfs==2024.2.0 > + aiobotocore==2.12.0 > + aiofiles==23.2.1 > + aiohttp==3.9.3 > + aioitertools==0.11.0 > + aiosignal==1.3.1 > + amqp==5.2.0 > + annotated-types==0.6.0 > + apispec==6.5.0 > + asn1crypto==1.5.1 > + async-timeout==4.0.3 > + asyncssh==2.14.2 > + authlib==1.3.0 > + aws-sam-translator==1.85.0 > + aws-xray-sdk==2.12.1 > + azure-batch==14.1.0 > + azure-common==1.1.28 > + azure-core==1.30.1 > + azure-cosmos==4.5.1 > + azure-datalake-store==0.0.53 > + azure-identity==1.15.0 > + azure-keyvault-secrets==4.8.0 > + azure-kusto-data==4.3.1 > + azure-mgmt-containerinstance==10.1.0 > + azure-mgmt-containerregistry==10.3.0 > + azure-mgmt-core==1.4.0 > + azure-mgmt-cosmosdb==9.4.0 > + azure-mgmt-datafactory==5.0.0 > + azure-mgmt-datalake-nspkg==3.0.1 > + azure-mgmt-datalake-store==0.5.0 > + azure-mgmt-nspkg==3.0.2 > + azure-mgmt-resource==23.0.1 > + azure-mgmt-storage==21.1.0 > + azure-nspkg==3.0.2 > + azure-servicebus==7.11.4 > + azure-storage-blob==12.19.0 > + azure-storage-file-datalake==12.14.0 > + azure-storage-file-share==12.15.0 > + azure-synapse-artifacts==0.18.0 > + azure-synapse-spark==0.7.0 > + babel==2.14.0 > + bcrypt==4.1.2 > + beautifulsoup4==4.12.3 > + billiard==4.2.0 > + boto3==1.34.51 > + botocore==1.34.51 > + cachetools==5.3.3 > + cattrs==23.2.3 > + celery==5.3.6 > + cfn-lint==0.85.3 > + chardet==5.2.0 > + click-didyoumean==0.3.0 > + click-plugins==1.1.1 > + click-repl==0.3.0 > + colorama==0.4.6 > - cryptography==42.0.5 > + cryptography==41.0.7 > + db-dtypes==1.2.0 > + decorator==5.1.1 > + distlib==0.3.8 > + dnspython==2.6.1 > + docker==7.0.0 > + elastic-transport==8.12.0 > + elasticsearch==8.12.1 > + email-validator==1.3.1 > + eventlet==0.35.2 > + filelock==3.13.1 > + flask-appbuilder==4.3.11 > + flask-babel==2.0.0 > + flask-jwt-extended==4.6.0 > + flask-limiter==3.5.1 > + flask-login==0.6.3 > + flask-sqlalchemy==2.5.1 > + flower==2.0.1 > + frozenlist==1.4.1 > + gcloud-aio-auth==4.2.3 > + gcloud-aio-bigquery==7.1.0 > + gcloud-aio-storage==9.2.0 > + gcsfs==2024.2.0 > + gevent==24.2.1 > + google-ads==23.1.0 > + google-analytics-admin==0.22.6 > + google-api-core==2.17.1 > + google-api-python-client==2.120.0 > + google-auth==2.28.1 > + google-auth-httplib2==0.2.0 > + google-auth-oauthlib==1.2.0 > + google-cloud-aiplatform==1.43.0 > + google-cloud-appengine-logging==1.4.2 > + google-cloud-audit-log==0.2.5 > + google-cloud-automl==2.13.2 > + google-cloud-batch==0.17.12 > + google-cloud-bigquery==3.17.2 > + google-cloud-bigquery-datatransfer==3.15.0 > + google-cloud-bigquery-storage==2.24.0 > + google-cloud-bigtable==2.23.0 > + google-cloud-build==3.23.2 > + google-cloud-compute==1.17.0 > + google-cloud-container==2.41.0 > + google-cloud-core==2.4.1 > + google-cloud-datacatalog==3.18.2 > + google-cloud-dataflow-client==0.8.9 > + google-cloud-dataform==0.5.8 > + google-cloud-dataplex==1.12.2 > + google-cloud-dataproc==5.9.2 > + google-cloud-dataproc-metastore==1.15.2 > + google-cloud-dlp==3.15.2 > + google-cloud-kms==2.21.2 > + google-cloud-language==2.13.2 > + google-cloud-logging==3.9.0 > + google-cloud-memcache==1.9.2 > + google-cloud-monitoring==2.19.2 > + google-cloud-orchestration-airflow==1.12.0 > + google-cloud-os-login==2.14.2 > + google-cloud-pubsub==2.19.7 > + google-cloud-redis==2.15.2 > + google-cloud-resource-manager==1.12.2 > + google-cloud-run==0.10.4 > + google-cloud-secret-manager==2.18.2 > + google-cloud-spanner==3.42.0 > + google-cloud-speech==2.25.0 > + google-cloud-storage==2.14.0 > + google-cloud-storage-transfer==1.11.2 > + google-cloud-tasks==2.16.2 > + google-cloud-texttospeech==2.16.2 > + google-cloud-translate==3.15.2 > + google-cloud-videointelligence==2.13.2 > + google-cloud-vision==3.7.1 > + google-cloud-workflows==1.14.2 > + google-crc32c==1.5.0 > + google-resumable-media==2.7.0 > + graphql-core==3.2.3 > + graphviz==0.20.1 > + greenlet==3.0.3 > + grpc-google-iam-v1==0.13.0 > + grpc-interceptor==0.15.4 > + grpcio-gcp==0.2.2 > + grpcio-status==1.62.0 > + httplib2==0.22.0 > + humanize==4.9.0 > + hvac==2.1.0 > + ijson==3.2.3 > + isodate==0.6.1 > + jmespath==1.0.1 > + joserfc==0.9.0 > + jschema-to-python==1.2.3 > + json-merge-patch==0.2 > + jsondiff==2.0.0 > + jsonpatch==1.33 > + jsonpath-ng==1.6.1 > + jsonpickle==3.0.3 > + jsonpointer==2.4 > + jsonschema-path==0.3.2 > + junit-xml==1.9 > + kombu==5.3.5 > + kubernetes==29.0.0 > + kubernetes-asyncio==29.0.0 > + ldap3==2.9.1 > + limits==3.9.0 > + looker-sdk==24.2.0 > + lxml==5.1.0 > + marshmallow-sqlalchemy==0.26.1 > + more-itertools==10.2.0 > + moto==5.0.2 > + mpmath==1.3.0 > + msal==1.27.0 > + msal-extensions==1.1.0 > + msrest==0.7.1 > + msrestazure==0.6.4 > + multidict==6.0.5 > + mypy-boto3-appflow==1.34.0 > + mypy-boto3-rds==1.34.50 > + mypy-boto3-redshift-data==1.34.0 > + mypy-boto3-s3==1.34.14 > + mysql-connector-python==8.3.0 > + mysqlclient==2.2.4 > + networkx==3.1 > + numpy==1.24.4 > + oauthlib==3.2.2 > + openapi-schema-validator==0.6.2 > + openapi-spec-validator==0.7.1 > + openlineage-integration-common==1.9.1 > + openlineage-python==1.9.1 > + openlineage-sql==1.9.1 > + ordered-set==4.1.0 > + pandas==2.0.3 > + pandas-gbq==0.21.0 > + paramiko==3.4.0 > + pathable==0.4.3 > + pbr==6.0.0 > + platformdirs==3.11.0 > + ply==3.11 > + portalocker==2.8.2 > + prison==0.2.1 > + prometheus-client==0.20.0 > + prompt-toolkit==3.0.43 > + proto-plus==1.23.0 > + psycopg2-binary==2.9.9 > + py-partiql-parser==0.5.1 > + pyarrow==15.0.0 > + pyasn1==0.5.1 > + pyasn1-modules==0.3.0 > + pyathena==3.3.0 > + pydantic==2.6.3 > + pydantic-core==2.16.3 > + pydata-google-auth==1.8.2 > + pynacl==1.5.0 > + pyodbc==5.1.0 > + pyopenssl==24.0.0 > + pyparsing==3.1.1 > + pyspnego==0.10.2 > + python-dotenv==1.0.1 > + python-http-client==3.3.7 > + python-ldap==3.4.4 > + pywinrm==0.4.3 > + redis==4.6.0 > + redshift-connector==2.1.0 > - referencing==0.33.0 > + referencing==0.31.1 > + regex==2023.12.25 > + requests-ntlm==1.2.0 > + requests-oauthlib==1.3.1 > + requests-toolbelt==1.0.0 > + responses==0.25.0 > + rsa==4.9 > + s3fs==2024.2.0 > + s3transfer==0.10.0 > + sarif-om==1.0.4 > + scramp==1.4.4 > + sendgrid==6.11.0 > + shapely==2.0.3 > + slack-sdk==3.27.1 > + snowflake-connector-python==3.7.1 > + snowflake-sqlalchemy==1.5.1 > + sortedcontainers==2.4.0 > + soupsieve==2.5 > + sqlalchemy-bigquery==1.10.0 > + sqlalchemy-redshift==0.8.14 > + sqlalchemy-spanner==1.6.2 > + sqlalchemy-utils==0.41.1 > + sqlparse==0.4.4 > + sshtunnel==0.4.0 > + starkbank-ecdsa==2.2.0 > + statsd==4.0.1 > + sympy==1.12 > + tomlkit==0.12.4 > + tornado==6.4 > + uritemplate==4.1.1 > - urllib3==2.2.1 > + urllib3==1.26.18 > + vine==5.1.0 > + virtualenv==20.25.1 > + watchtower==3.0.1 > + wcwidth==0.2.13 > + websocket-client==1.7.0 > + xmltodict==0.13.0 > + yarl==1.9.4 > + zope-event==5.0 > + zope-interface==6.2

Compare it with the equivalent pip result:

uv pip install 'apache-airflow[aiobotocore,amazon,async,celery,cncf-kubernetes,common-io, \
docker,elasticsearch,ftp,google,google-auth,graphviz,grpc,hashicorp,http,ldap,microsoft-azure, \
mysql,odbc,openlineage,pandas,postgres,redis,sendgrid,sftp,slack,snowflake, \
ssh,statsd,uv,virtualenv] @ https://github.com/apache/airflow/archive/main.tar.gz'
Result of `pip install` > Installing collected packages: wcwidth, unicodecsv, text-unidecode, statsd, starkbank-ecdsa, sortedcontainers, pytz, ply, lockfile, json-merge-patch, ijson, distlib, cron-descriptor, colorlog, azure-nspkg, azure-common, asn1crypto, zope.interface, zope.event, zipp, wrapt, websocket-client, vine, urllib3, uritemplate, uc-micro-py, tzdata, typing-extensions, tornado, tomlkit, termcolor, tenacity, tabulate, sqlparse, sqlalchemy, soupsieve, sniffio, slack_sdk, six, setproctitle, scramp, rpds-py, PyYAML, python-slugify, python-http-client, python-dotenv, pyparsing, pyodbc, pyjwt, pygments, pycparser, pyasn1, psycopg2-binary, psutil, protobuf, prompt-toolkit, prometheus-client, portalocker, pluggy, platformdirs, pkgutil-resolve-name, pathspec, packaging, ordered-set, opentelemetry-semantic-conventions, openlineage-sql, oauthlib, numpy, mysqlclient, mysql-connector-python, multidict, more-itertools, mdurl, markupsafe, lxml, lazy-object-proxy, jsonpath_ng, jmespath, itsdangerous, inflection, idna, humanize, h11, grpcio, greenlet, graphviz, google-re2, google-crc32c, fsspec, frozenlist, filelock, exceptiongroup, docutils, dnspython, dill, decorator, configupdater, colorama, click, charset-normalizer, chardet, certifi, cachetools, cachelib, blinker, billiard, bcrypt, backports.zoneinfo, backoff, Babel, azure-mgmt-nspkg, attrs, async-timeout, argcomplete, aiofiles, yarl, wtforms, werkzeug, virtualenv, universal-pathlib, sqlalchemy-utils, sqlalchemy_redshift, sqlalchemy-jsonfield, shapely, sendgrid, rsa, rfc3339-validator, requests, referencing, redis, python-dateutil, python-daemon, pyasn1-modules, pyarrow, proto-plus, prison, opentelemetry-proto, marshmallow, markdown-it-py, Mako, linkify-it-py, ldap3, jinja2, isodate, importlib-resources, importlib-metadata, httplib2, httpcore, gunicorn, grpcio-gcp, grpc-interceptor, googleapis-common-protos, google-resumable-media, gevent, eventlet, email-validator, elastic-transport, deprecated, clickclick, click-repl, click-plugins, click-didyoumean, cffi, cattrs, beautifulsoup4, azure-mgmt-datalake-nspkg, asgiref, apispec, anyio, amqp, aiosignal, aioitertools, time-machine, rich, requests_toolbelt, requests-oauthlib, python-nvd3, python-ldap, pynacl, pandas, opentelemetry-exporter-otlp-proto-common, opentelemetry-api, openlineage-python, mdit-py-plugins, marshmallow-sqlalchemy, marshmallow-oneofschema, looker-sdk, limits, kombu, jsonschema-specifications, hvac, httpx, grpcio-status, google-cloud-audit-log, google-auth, flask, elasticsearch, docker, cryptography, croniter, botocore, azure-core, alembic, aiohttp, s3transfer, rich-argparse, PyOpenSSL, pendulum, paramiko, opentelemetry-sdk, openlineage-integration-common, msrest, kubernetes_asyncio, kubernetes, jsonschema, grpc-google-iam-v1, google-auth-oauthlib, google-auth-httplib2, google-api-core, gcloud-aio-auth, flask-wtf, Flask-SQLAlchemy, flask-session, flask-login, Flask-Limiter, Flask-JWT-Extended, flask-caching, Flask-Babel, db-dtypes, celery, azure-storage-file-share, azure-storage-blob, azure-servicebus, azure-mgmt-core, azure-keyvault-secrets, azure-cosmos, authlib, asyncssh, aiobotocore, adal, sshtunnel, snowflake-connector-python, pydata-google-auth, opentelemetry-exporter-otlp-proto-http, opentelemetry-exporter-otlp-proto-grpc, msrestazure, msal, google-cloud-core, google-api-python-client, google-ads, gcloud-aio-storage, gcloud-aio-bigquery, flower, flask-appbuilder, connexion, boto3, azure-synapse-spark, azure-synapse-artifacts, azure-storage-file-datalake, azure-mgmt-storage, azure-mgmt-resource, azure-mgmt-datafactory, azure-mgmt-cosmosdb, azure-mgmt-containerregistry, azure-mgmt-containerinstance, watchtower, snowflake-sqlalchemy, redshift_connector, PyAthena, opentelemetry-exporter-otlp, msal-extensions, google-cloud-workflows, google-cloud-vision, google-cloud-videointelligence, google-cloud-translate, google-cloud-texttospeech, google-cloud-tasks, google-cloud-storage-transfer, google-cloud-storage, google-cloud-speech, google-cloud-spanner, google-cloud-secret-manager, google-cloud-run, google-cloud-resource-manager, google-cloud-redis, google-cloud-pubsub, google-cloud-os-login, google-cloud-orchestration-airflow, google-cloud-monitoring, google-cloud-memcache, google-cloud-language, google-cloud-kms, google-cloud-dlp, google-cloud-dataproc-metastore, google-cloud-dataproc, google-cloud-dataplex, google-cloud-dataform, google-cloud-dataflow-client, google-cloud-datacatalog, google-cloud-container, google-cloud-compute, google-cloud-build, google-cloud-bigtable, google-cloud-bigquery-storage, google-cloud-bigquery-datatransfer, google-cloud-bigquery, google-cloud-batch, google-cloud-automl, google-cloud-appengine-logging, google-analytics-admin, azure-mgmt-datalake-store, azure-datalake-store, azure-batch, sqlalchemy-spanner, sqlalchemy-bigquery, pandas-gbq, google-cloud-logging, google-cloud-aiplatform, gcsfs, azure-identity, azure-kusto-data, adlfs, apache-airflow-providers-smtp, apache-airflow-providers-imap, apache-airflow-providers-http, apache-airflow-providers-ftp, apache-airflow-providers-fab, apache-airflow-providers-common-sql, apache-airflow-providers-common-io, apache-airflow-providers-sqlite, apache-airflow-providers-ssh, apache-airflow-providers-snowflake, apache-airflow-providers-slack, apache-airflow-providers-sftp, apache-airflow-providers-sendgrid, apache-airflow-providers-redis, apache-airflow-providers-postgres, apache-airflow-providers-openlineage, apache-airflow-providers-odbc, apache-airflow-providers-mysql, apache-airflow-providers-microsoft-azure, apache-airflow-providers-hashicorp, apache-airflow-providers-grpc, apache-airflow-providers-google, apache-airflow-providers-elasticsearch, apache-airflow-providers-docker, apache-airflow-providers-cncf-kubernetes, apache-airflow-providers-celery, apache-airflow-providers-amazon

Note - all the apache-airflow-providers-* packages missing in case of uv pip install.

The problem is likely that the installation uses directly pyproject.toml to install dependencies, however for such remote installation (and without --editable install at that - but even if it would be specified, --editable makes no sense for remote install) the dependencies should be the same as in packaged .whl file and it makes the installation of uv in this case non-compliant with PEP 517.

A bit more context: Airlfow uses hatchling build backend, and utilzes PEP 517 compliant build_hooks (https://peps.python.org/pep-0517/#build-wheel) to modify the --editable extras into wheel extras on the flight. So for example [celery] requirement in pyproject.toml ( https://github.com/apache/airflow/blob/main/pyproject.toml#L641) is this:

celery = [ # source: airflow/providers/celery/provider.yaml
  "celery>=5.3.0,<6,!=5.3.3,!=5.3.2",
  "flower>=1.0.0",
  "google-re2>=1.0",
]

However the hatchling build hook of ours, when preparing wheel package, replaces this extra with:

"apache-airflow-providers-celery"

This is the way how we are dealing with our monorepo where --editable "extra" just installs dependencies of our providers, while the "wheel" extra install actual provider (and transitively dependencies of that provider).

I believe that PEP-517 compliant way of installing a package from remote URL should actually build the wheel file first using the build backend the project has defined in pyproject.toml and only then install such a wheel file (this is exactly what pip does under the hood when installing package from remote url - treating it the same way as installind an sdist package (which the remote URL is equivalent of).

charliermarsh commented 6 months ago

I have https://github.com/astral-sh/uv/pull/2645 open for now.

charliermarsh commented 6 months ago

I suppose I'll use your implementation's hack and return None.

My read is that this won't work for pip:

def prepare_metadata_for_build_wheel(
        metadata_directory, config_settings, _allow_fallback):
    """Invoke optional prepare_metadata_for_build_wheel

    Implements a fallback by building a wheel if the hook isn't defined,
    unless _allow_fallback is False in which case HookMissing is raised.
    """
    backend = _build_backend()
    try:
        hook = backend.prepare_metadata_for_build_wheel
    except AttributeError:
        if not _allow_fallback:
            raise HookMissing()
    else:
        return hook(metadata_directory, config_settings)
    # fallback to build_wheel outside the try block to avoid exception chaining
    # which can be confusing to users and is not relevant
    whl_basename = backend.build_wheel(metadata_directory, config_settings)
    return _get_wheel_metadata_from_wheel(whl_basename, metadata_directory,
                                          config_settings)

(Though I know you omit pip anyway altogether.)

charliermarsh commented 6 months ago

I would say just leave it for now, there's no need to make any urgent changes.

charliermarsh commented 6 months ago

I ended up special-casing hatchling in https://github.com/astral-sh/uv/pull/2645. We should figure out a better solution together.