Tinder / bazel-diff

Performs Bazel Target Diffing between two revisions in Git, allowing for Test Target Selection and Selective Building
Other
397 stars 59 forks source link

Transitive dependency of repository rule do not change bazel-diff output hashes #134

Closed tingilee closed 1 year ago

tingilee commented 2 years ago

In this minimal repo https://github.com/tingilee/repro-pip-bazel-diff/, we have found that transitive dependencies in the repository rules do not change the output hashes.

https://github.com/tingilee/repro-pip-bazel-diff/blob/master/BUILD#L15 :lib depends on testcontainers and testcontainers dependent on wrapt.

I ran the following at commit https://github.com/tingilee/repro-pip-bazel-diff/commit/697f9090ae87b97f8dad7edd94520c34ea89aaa8 and one commit prior.

bazel run @bazel_diff//:bazel-diff -- generate-hashes -w /Users/jacqueline.lee/repro-pip-bazel-diff -b /usr/local/bin/bazel /tmp/hashes.json

And then ran

bazel run @bazel_diff//:bazel-diff  -- -sh /tmp/starting_hashes_before_change.json -fh /tmp/starting_hashes_after_change.json -w /Users/jacqueline.lee/repro-pip-bazel-diff -b /usr/local/bin/bazel -o /tmp/impacted

The expectation is :lib should be in the impacted targets since its transitive dependency has changed, but this is not true.

alexeagle commented 2 years ago

It seems to me that bazel-diff is relying on the definition of a repository rule to understand the dependency graph, rather than the result of executing the repository rule. In the case of rules_python, and I imagine many other rulesets, the dependency graph is produced by some code generation inside the repository rule implementation.

Taking this example repro above, we can see that the //external:all-targets targets that bazel-diff queries for, which define a repository rule, do not express dependencies. For example,

 % bazel query --output=build //external:my_deps_testcontainers
# /Users/alex.eagle/repros/pip_bazel-diff/WORKSPACE:31:13
whl_library(
  name = "my_deps_testcontainers",
  generator_name = "my_deps_testcontainers",
  generator_function = "install_deps",
  repo = "my_deps",
  requirement = "testcontainers==3.5.4     --hash=sha256:7a5583922cbb3da5ed255975f1927d8b29bf6a638323a85fd41b6e46c48ada8f",
...
)

doesn't express any dependency on wrapt.

If we instead used the result of executing the repository rule, then the dependency is present:

% bazel query --output=build @my_deps_testcontainers//:pkg    
# /private/var/tmp/_bazel_alex.eagle/a060b90d6e0425005f043993b0078cee/external/my_deps_testcontainers/BUILD.bazel:22:11
py_library(
  name = "pkg",
  tags = ["__PYTHON_RULES_MIGRATION_DO_NOT_USE_WILL_BREAK__", "pypi_name=testcontainers", "pypi_version=3.5.4"],
  generator_name = "pkg",
  generator_function = "py_library",
  generator_location = "/private/var/tmp/_bazel_alex.eagle/a060b90d6e0425005f043993b0078cee/external/my_deps_testcontainers/BUILD.bazel:22:11",
  deps = ["@my_deps_deprecation//:pkg", "@my_deps_docker//:pkg", "@my_deps_wrapt//:pkg"],
  data = ["@my_deps_testcontainers//:testcontainers-3.5.4.dist-info/LICENSE.txt", "@my_deps_testcontainers//:testcontainers-3.5.4.dist-info/METADATA", "@my_deps_testcontainers//:testcontainers-3.5.4.dist-info/WHEEL", "@my_deps_testcontainers//:testcontainers-3.5.4.dist-info/top_level.txt"],
 ...
)
# Rule pkg instantiated at (most recent call last):
#   /private/var/tmp/_bazel_alex.eagle/a060b90d6e0425005f043993b0078cee/external/my_deps_testcontainers/BUILD.bazel:22:11 in <toplevel>
#   /private/var/tmp/_bazel_alex.eagle/a060b90d6e0425005f043993b0078cee/external/rules_python/python/defs.bzl:52:22       in py_library

To prove this theory, I made a trivial edit in rules_python, so that the whl_library target from that first query will express the dependency:

diff --git a/python/pip_install/parse_requirements_to_bzl/__init__.py b/python/pip_install/parse_requirements_to_bzl/__init__.py
index 9519d62..8ee5da2 100644
--- a/python/pip_install/parse_requirements_to_bzl/__init__.py
+++ b/python/pip_install/parse_requirements_to_bzl/__init__.py
@@ -144,6 +144,7 @@ def generate_parsed_requirements_contents(
                 whl_library(
                     name = name,
                     requirement = requirement,
+                    deps = ["@my_deps_wrapt//:whl"] if not name.startswith("my_deps_wrapt") else [],
                     annotation = _get_annotation(requirement),
                     **_config
                 )
diff --git a/python/pip_install/pip_repository.bzl b/python/pip_install/pip_repository.bzl
index d9888a2..fe39908 100644
--- a/python/pip_install/pip_repository.bzl
+++ b/python/pip_install/pip_repository.bzl
@@ -408,6 +408,7 @@ whl_library_attrs = {
         ),
         allow_files = True,
     ),
+    "deps": attr.label_list(),
     "repo": attr.string(
         mandatory = True,
         doc = "Pointer to parent repo name. Used to make these rules rerun if the parent repo changes.",

and now we get the output we expected:

----------------------
Impacted Targets between c0328895e7adcf7492671680f2622b075a588c27 and 6d47b43f8beabf26bc7d1a4d16aed149e963a5ea:
//external:my_deps_websocket_client
//:requirements.txt
//:requirements.update
//external:my_deps_deprecation
//:requirements_test
//:bin
//:requirements
//:lib
//:requirements.in
//external:my_deps_idna
//external:my_deps_urllib3
//external:my_deps_charset_normalizer
//external:my_deps_testcontainers
//external:my_deps_requests
//external:my_deps_certifi
//external:my_deps_pyparsing
//external:my_deps
//external:my_deps_packaging
//external:my_deps_docker
//external:my_deps_wrapt
illicitonion commented 2 years ago

FWIW this style of issue was one of our motivations for putting together a similar project at https://github.com/bazel-contrib/target-determinator - we put in quite a bit of care to follow deps through repository rules to all affecting files.

I manually tested out the supplied repro, and that implementation detected the change properly.

The target-determinator repo includes a test suite of edge-cases which runs those tests with bazel-diff, bazel-differ, and the target-determinator implementation (https://github.com/bazel-contrib/target-determinator/tree/main/tests/integration) - I'm not sure we have an exact test case for this case, but it could be nice to add a reduced version of this situation to that test suite to track improvements :)

We'd also absolutely love to collaborate with folks (particularly those who have contributed to bazel-diff) to consolidate all of our varied experiences together!

tinder-maxwellelliott commented 1 year ago

This is expect behavior. If you want to include these transitive targets I would look into the -s=<seedFilepaths> option.