bazelbuild / rules_python

Bazel Python Rules
https://rules-python.readthedocs.io
Apache License 2.0
531 stars 540 forks source link

Support a default hub repo concept in bzlmod #2094

Open aignas opened 3 months ago

aignas commented 3 months ago

🚀 feature request

Relevant Rules

pip.parse

Description

Currently the pip.parse needs the hub_name parameter, which in reality is quite often set to pypi or pip in the root module. In the non-root modules the users should be also able to use pip or pypi names and this could be the default hub_name value.

Describe the solution you'd like

Minimum (for 1.0.0):

Extra (post 1.0.0):

The lock file merging is used in the go ecosystem and requiring users to either name the hub repo that collects the spoke repos or using a default name is the convention of rules_go/gazelle, see bazelbuild/bazel-gazelle#1584. Also see bazelbuild/bazel#20186, which talks about stabilizing extension isolation feature. Having rules_python supporting it may be a good way to give feedback to bazel devs.

See #2088 for a slightly related topic.

Describe alternatives you've considered

The main alternative would be to not support it and keep things as they are, but that could make it harder to implement the isolation later down the line. What is more, supporting isolation of the extensions could make the pip code simpler, because we could just ask the user to use isolate if we see clashes.

groodt commented 3 months ago
  • Set the hub_name default value to pypi (it should not be pip because we are not using pip here and pypi is a more generic term).

+1 For pypi and not pip

  • Support passing pyproject.toml and/or constraint files to pip.parse instead of the lock file itself. Enable users to lock the dependency tree via @pypi//:lock target to generate a lock file in the users workspace.

This is one additional reason why I've been keen to move dependency specifier lists into Starlark.

rickeylev commented 3 months ago

I'm generally +1 on this idea.

However, I don't see how this can reliably interact with lock files, i.e., how specific versions of a transitive closures can be merged and respected and work reliably. The issue is the inclusion of arbitrary requirements from elsewhere in the module graph.

use isolate=True if there are clashes

I don't see how this would scale? Consider a triangle dependency situation:

A and B work fine on their own. Its only when they both end up in a transitive module graph structure that issues occur.

groodt commented 3 months ago

Yes, I should clarify as well. I'm only firmly +1 on the hub_name defaults.

For the rest, I'm -1 Im not convinced on any lock file merging for similar reasons as @rickeylev. I don't believe that its generally solvable, which will lead to more issues than it solves, and more maintenance burden.

I can be convinced of a solution where we let the root module re-lock against direct dependencies from transitive repositories, but even then, Im a little confused by the problems this is believed to solve. If somebody needs to compose code and dependencies at a python language level (not a bazel extension level), then I think the most standard path forward would be to publish packages (to PyPI or other) or to use git submodules or similar code sharing mechanisms to merge source code repositories.

I don't think languages (outside golang), can safely compose x_library rules across repositories in the presence of third-party dependencies. For example, in Java, you'd share a jar by publishing it to Maven Central. For rust, you'd share a crate by publishin it to crates.io. Similarly, for Python, you'd share a sdist/wheel by publishing it to PyPI. Private artifact registries are also possible for all these languages as well.

rickeylev commented 3 months ago

I can imagine an API like:

pip.package(name = "foo")

And this basically acts like:

# requirements.in
foo
# requirements.txt
foo == blabla --hash blabla
dep-of-foo == etc etc

# module.bazel
pip.parse(
  hub_name = "pypi",
  requirements_lock = "requirements.txt"
)

And I guess it would run requirements.in -> requirements.txt conversion during the bzlmod extension phase. This also necessitates that the resulting requirements.txt isn't implicitly platform specific.

Adding version constraints to the pip.package() call seems problematic, though. It seems more like a "best effort, but no guarantee" type of thing.

groodt commented 3 months ago

Yes, rules_rust has something similar for crates in cargo. You can load crates from a cargo lockfile, but you can also load them from specs in Starlark: https://github.com/bazelbuild/rules_rust/blob/b5ecaea4b5996912693135ea14447d22271e9cd5/examples/bzlmod/proto/MODULE.bazel#L77

crate = use_extension("@rules_rust//crate_universe:extension.bzl", "crate")

crate.spec(
    package = "prost",
    version = "0.12",
)
crate.spec(
    package = "prost-types",
    version = "0.12",
)

...

crate.from_specs()
use_repo(crate, "crates")

There's also a variation that looks like this:

crates_repository(
    name = "crate_index",
    cargo_lockfile = "//:Cargo.lock",
    lockfile = "//:Cargo.Bazel.lock",
    packages = {
        "async-trait": crate.spec(
            version = "0.1.51",
        ),
        "mockall": crate.spec(
            version = "0.10.2",
        ),
        "tokio": crate.spec(
            version = "1.12.0",
        ),
    },
)

And the following to relock CARGO_BAZEL_REPIN=1 bazel sync --only=crate_index

I'd think it would be possible to do something similar in python. I think they have things a bit easier because I don't think they need to deal with an sdist vs wheel situation. But yeah, the crate.from_specs() call would be solving dependencies behind the scenes and failing if there is no solution.