bazeltools / bazel-deps

Generate bazel dependencies for maven artifacts
MIT License
249 stars 121 forks source link

Surprising resolution behavior when fixing versions (bug?) #379

Open kmate-ct opened 5 months ago

kmate-ct commented 5 months ago

I ran into a situation recently that pretty much surprised and given the circumstances I might call it a bug. The following happened.

Library A was pulling in library B. Unfortunately, I did not realize, that besides having a fixed version of A in my dependencies.yaml, it gets pulled in through another library C. But, of course, that version of A is different, and also it pulls in a different version of B obviously. Also, importantly, the A and B pulled in by C are having higher versions than what I fixed for A, and the version resolution strategy was set to "highest". Let's assume the following:

  1. I fixed A's version to 1.0, that pulls in B:1.0.
  2. C pulled in A:2.0, which pulled in B:2.0.

Now I get a resolution comment for B like:

# duplicates in B downgraded to 2.0
# - A:1.0 wanted version 1.0

There are 2 obvious problems with the resolution comment:

  1. this is not a downgrade (B went to 2.0 from 1.0)
  2. there's no line that's showing A:2.0 that was actually chosen

The reason for the second is, that the normalized graph doesn't contain A:2.0 as it was fixed to A:1.0, hence that edge falls out in a filter when the duplicates are being collected. The first is somewhat a side effect of this: the string "downgraded" is coming from an else branch of a condition which is not prepared for the case when the highest, chosen version is not present in the duplicate list.

I think this behavior is pretty much problematic as we chose the version of B based on a version of A that never gets effective. In my opinion, the fact that A's version was fixed to 1.0, should have had an effect on setting B's version, which should result in 1.0 in the above example.

(I checked out the repo ~3 days ago and used that version for investigations.)

kmate-ct commented 5 months ago

If you'd like to debug it, here is a potential dependency example:

  com.google.cloud:
    google-cloud-pubsub: # library D, see below
      lang: java
      version: "1.123.0"
    google-cloud-storage: # library A
      lang: java
      version: "2.14.0"

  com.github.fs2-blobstore: # library C
    gcs:
      lang: scala
      version: "0.9.12"

Here C would pull in com.google.cloud:google-cloud-storage:2.27.0, a different version of A

The role of library B is played by org.threeten:threetenbp, which gets a resolution comment like:

# duplicates in org.threeten:threetenbp downgraded to 1.6.8
# - com.google.cloud:google-cloud-pubsub:1.123.0 wanted version 1.6.5
# - com.google.cloud:google-cloud-storage:2.14.0 wanted version 1.6.3

Library D was required to have another version from B; this ensures that the resolution comment is not null. Otherwise it won't matter much, as the version it wanted to pull in from B was smaller than what got chosen.

kmate-ct commented 5 months ago

If changing the resolution logic is not an option, which I would totally understand, I'd ask for fixing at least the resolution comment by adding the missing line - maybe with some addition that it was evicted. This would automatically fix the "downgraded" part as well, resulting in something like:

# duplicates in org.threeten:threetenbp promoted to 1.6.8
# - com.google.cloud:google-cloud-pubsub:1.123.0 wanted version 1.6.5
# - com.google.cloud:google-cloud-storage:2.14.0 wanted version 1.6.3
# - com.google.cloud:google-cloud-storage:2.27.0 (evicted) wanted version 1.6.8
kmate-ct commented 5 months ago

I'm not sure this is not because the following in Normalizer:

      // invariant: all of node's parents must be unambiguous
      def fixVersion(node: Node): Table = {

doesn't seem to be satisfied. When we call this method for org.threeten:threetenbp in the example above, its table still contains 2 different versions of com.google.cloud:google-cloud-storage. Isn't this violating the invariant?

johnynek commented 5 months ago

If we look here:

      def disambiguateHelper(node: Node, visited: Set[Node]): Table =
        table(node)
          .map(_._1.map(_.unversioned))
          .find(isAmbiguous) match {
          case None => fixVersion(node) // note that parents are not ambiguous
          case Some(None) => sys.error("unreachable, roots are never ambiguous")
          case Some(Some(p)) => {
            if (!visited.contains(p)) {
              disambiguateHelper(p, visited + p)
            } else {
              // We found a cycle in the maven dependency graph. Maven is OK with this (why!?),
              // but bazel won't be. However, this might be a cycle in the transitive dependency
              // graph that won't be present in the BUILD files, so we'll allow it for now.
              fixVersion(node)
            }
          }
        }

We see that we explicitly violate the "invariant" in the case that node is a member of a cycle in the dependency graph (when projected into the unversioned space).

I wonder if we can see if this is the case in your example. e.g. we could print out if we are in that branch and what the set looks like in your case.

Definitely, it seems like weird behavior. I would be in favor of fixing bugs. I think the key issue is: what impact should any version that is evicted from the final normalized graph have? I agree it seems like if a version was only transitively pulled in, and subsequently replaced by a higher version, it shouldn't change the resolution I would say.

A rule I definitely think should be true: if you explicitly declare in the dependencies.yaml file all the transitively resolved dependencies and re-run, you should get the same result set. This is a form of normalization being idempotent or something.

kmate-ct commented 5 months ago

I attached a debugger and concluded that the code in the above example I provided does not enter the path for cycles. However, I found the following in the table right before calling fixVersion on org.threeten:threetenbp:

Screenshot 2024-02-06 at 10 16 14

which shows that com.google.cloud:google-cloud-storage was already had its fixVersion ran, and selected 2.14.0 (note that the selected, bottom entry was originally 2.27.0 before that). But still, for org.threeten:threetenbp itself, this resolution is not visible; it still sees 2 versions of that library:

Screenshot 2024-02-06 at 10 16 35

Seemingly, the results of fixVersion on the parent are not getting visible to the fixVersion in the child. I'll try to dig deeper why, I'm still learning how does the code work.

kmate-ct commented 5 months ago

After digging through the code of fixTable, I don't see any code path in the current version that would make the parent's resolution visible to the children in table, as fixVersion always only touches the current node's entries, but not actually disambiguate the children's parent references in their own entries.

As far as I see, and as it's shown on the above screenshots, the table entries are getting fixed for node com.google.cloud:google-cloud-storage in its own fixVersion (both set to Right(Version(2.14.0))), but for org.threeten:threetenbp the original, initial entries remain, listing both Right(Version(2.14.0)) and Right(Version(2.27.0)) as parents, whereas in this step I'd expect the latter to be gone already.

kmate-ct commented 5 months ago

I created #381 that adds the logic to filter out already evicted versions, so they won't contribute new versions of their children potentially. I'm curious what do you think about it.

kmate-ct commented 5 months ago

Just for reference, I checked what maven_install does in rules_jvm_external. That seem to use coursier fetch internally instead of re-implementing a resolution logic on its own, which sounds like a much better idea. I tested it with 2 version conflict handling policies, and neither gives the combination what bazel-deps gives currently. Here's how that works on my above example, depending on the version conflict resolution:

  1. default: choses highest, but then if it goes against the pinned version, the highest also wins there. This means com.google.cloud:google-cloud-storage:2.27.0 and org.threeten:threetenbp:1.6.8 will be effective, which is an acceptable combination.
  2. pinned: takes fixed versions seriously, but then does not select highest versions where the parent gets evicted because of pinning. This is equivalent with what my patch on #381 produces.