coursier / coursier

Pure Scala Artifact Fetching
https://get-coursier.io
Apache License 2.0
2.03k stars 303 forks source link

Pipe Dream: Coursier "Solve" #1863

Open djspiewak opened 3 years ago

djspiewak commented 3 years ago

What follows is part brain-dump, part RFC, and part "somebody please do this because dear god I need it but I don't have enough time". Happy for this to exist somewhere else if the maintainers prefer, but I thought it made sense here.

Problem

You depend on a set of libraries. Let's say the following:

(to be clear, these are your direct, stated dependencies in your build.sbt)

You want to upgrade any one of these. For example, assume http4s. However, upgrading one results in the other now depending on an old version of something transitive. Maybe catnap is now broken because http4s pulls in a newer version of Cats Effect, or conversely maybe there's a better version of Cats Effect that can be grabbed and explicitly evicted to. Upgrading http4s potentially messes up downstream things, such as almost anything in the Davenportverse, so that's a concern.

It's very complicated and hard, and ultimately requires a relatively expert-level knowledge of the ecosystem and its transitive dependencies (and who happens to be maintaining bincompat and who isn't!) in order to figure it all out. It's also very time consuming to do by hand.

To be clear, this problem generalizes to any polyrepo distribution system. Companies which use polyrepos actually feel this problem far more acutely than the public ecosystem (believe me…). This, in a nutshell, represents the strongest objective argument against an extensible ecosystem: it's incredibly difficult to identify "compatible sets".

Possible Solution

What if we could just… ask Coursier? Think about something like this (literally making up syntax here):

$ coursier solve --scala 2.13 org.http4s::http4s-server org.typelevel::cats-effect:2.2.0 dev.profunktor::redis4cats-effects 'co.fs2::fs2-io:2.[3,)'

Imagine if Coursier then would spit out the most recent, maximally-compatible set of dependencies, ideally in a way that can be copy/pasted into a libraryDependencies declaration. The idea here is that we're expressing a set of constraints (note the missing versions). We want these things, some of which with this specific version, some with an Ivy range, and some without any version at all, and we want the solver to figure out what our ideal build configuration should be in order to ensure everything is mutually compatible but also upgraded as far as possible.

Additionally, note the Scala version is explicitly included here. This is partially for convenience (so we can use the fictional :: notation), but also so that Coursier can find versions which comply with our version. Not everyone is on the latest Scala, and sorting out compatible sets on older versions while upgrading as much as possible is a massively non-trivial problem, particularly when libraries like Circe have conditional dependencies that jump between breaking lines and other non-linear things.

Obviously this isn't possible right now. But I think it could be.

Implementation

There are a couple things that would be needed for this. The most obvious one is some assumptions about declared compatibility in versioning. Coursier already defaults to Ivy's eviction rules on this one, and I think that's fair, but it should be overridable on a per-artifact basis. For example, Cats Effect is fully binary compatible within its major lines, and additionally happens to be binary compatible between 1.x and 2.x (Cats is as well), and it should be possible to declare this somewhere in a fashion that all users have access to it. Ideally this would just be in the POMs, but we can't necessarily do that, so some external metadata mechanism is probably necessary. This wouldn't need to be in MVP, but it would be helpful.

The biggest piece though is the solver itself. From what I remember from the last time I looked into this, Coursier's resolver is almost powerful enough to do this, save for one critical piece: it doesn't support backtracking. Coursier's resolver isn't a general constraint solver, it's just a straightforward iteration algorithm that attempts to resolve conflicts by evicting forward. This is entirely sufficient to replace Ivy in sbt, but it's not enough to achieve this idea.

However, I don't think it would be hard to make it enough. Constraint solvers honestly aren't that hard to write (stick your constraints into a set, then iterate on that set until you either instantiate every variable or you complete an iteration without making any changes), and implementing this would greatly extend Coursier's capabilities and allow it to fill this critical and glaring need within the ecosystem.

eed3si9n commented 3 years ago

Do all these libraries implement strict Semantic Versioning?

keynmol commented 3 years ago

I've actually started on a project which has a similar goal - a view of the ecosystem as a multi-graph, that can be traversed from a set of requirements. In my case the requirement was "is it ready for 2.13?" which is reflected in the labeling (green/red).

The original idea was to guide people who want to contribute to Scala version upgrades in the ecosystem that are desperately needed, but this has similar problems - the quality of the raw data that can be sourced from, say, Scaladex, is not good enough to solve for "Pick a subgraph that will be completely in 2.13 if you upgrade this minimal set of dependencies in this order"

(all the red stuff in the corner is Scalameta's seemingly abandoned projects :)

image

djspiewak commented 3 years ago

Do all these libraries implement strict Semantic Versioning?

Many, not all. This should be configurable. We're always going to have weird issues like ScalaCheck (which breaks things in their minor line) or even Netty. My thought would be that this command would take, in the arguments, a list of artifact selectors paired with known resolution strategies, or alternatively an external manifest which contains the same (so that this can be shared). I touched on that in a highly imprecise way in the OP.

djspiewak commented 3 years ago

@keynmol First off, that is absolutely fascinating. But second, I think working within Coursier's infrastructure would probably have made your job a lot easier there. :-D Coursier's internal APIs are actually really good and really well-factored, so it's not difficult to recompose them to do other things. To be clear, the OP is barely even a recomposition of its internal APIs: it's just a minor addition to the resolver which is already architecturally-compatible with backtracking. But doing more elaborate things like what you're talking about should also be relatively straightforward.

cosmicexplorer commented 3 years ago

To the point just above: one of the first responses to a slightly different proposal I made against pip (pypa/pip#7819) was (as in https://github.com/coursier/coursier/issues/1863#issuecomment-695828508) to make use of a new endpoint from the artifact host to fetch dependencies quickly (although that was much simpler than the multi-graph description above), instead of adding more infrastructure in the resolver. I will note that regardless of that feedback, I worked on implementing it in the pip resolver itself (e.g. pypa/pip#8448). From that vaguely-relevant experience, I will say that I am somewhat convinced this can and should be done in the resolver tool itself.

keynmol commented 3 years ago

@djspiewak the actual resolution for the graph is all done through Coursier :) because it's amazing. The seed data for "what are the core nodes we want to have in this view of the ecosystem, for which precise dependency information is required" is sourced partially by hand partially from Scaladex.

Seeing this post I might just experiment with what the graph looks like with precise versions added back in, and what would jumping from one to another would look like in terms of subgraphs and number of updates.

eed3si9n commented 3 years ago

Coursier already defaults to Ivy's eviction rules on this one, and I think that's fair, but it should be overridable on a per-artifact basis.

sbt 1.4.0 will add ThisBuild / versionScheme (https://github.com/sbt/sbt/issues/5710), and there's a plugin called sbt-version-policy that Alex wrote that lets you locally override similar information:

versionPolicyDependencyRules += "org.scala-lang" % "scala-compiler" % "strict"

The biggest piece though is the solver itself.

When we ask a module with a few libraryDependencies effectively it's doing the solving. What's NOT happening (and it should) is resolver saying "I'm sorry I can't" when the constraints are incompatible. So I think what we are asking is strict mode based on more accurate compatibility check based on Semantic Versioning, PVP, etc declared by the library.

djspiewak commented 3 years ago

What's NOT happening (and it should) is resolver saying "I'm sorry I can't" when the constraints are incompatible. So I think what we are asking is strict mode based on more accurate compatibility check based on Semantic Versioning, PVP, etc declared by the library.

In part, yes. But the current "solver" also doesn't backtrack. For example, when trying to find a compatible set between dependencies A and B, there is no current functionality which can look at the transitive dependencies of A and B and find compatibility there, then backtrack up to A and B themselves and see what can be compatible with those upstream modules. This is far and away the most painful bit today, and the current tooling cannot achieve it.

To restate: the current resolution mechanism can find transitive dependencies given a set of direct versions, but it cannot find a set of direct versions given an inferred set of transitive versions. This limitation is fundamental in the algorithm, which is currently forward-only and does not iterate on constraints.

eed3si9n commented 3 years ago

ok.

If we use org.http4s::http4s-server as an example, assuming none of org.typelevel::cats-effect:2.2.0 dev.profunktor::redis4cats-effects 'co.fs2::fs2-io:2.[3,)' depend on http4s-server, you're asking Coursier to go from the latest and try different http4s-server versions to find the maximum version of http4s-server that fits the puzzle.

djspiewak commented 3 years ago

you're asking Coursier to go from the latest and try different http4s-server versions to find the maximum version of http4s-server that fits the puzzle

Yes. Abstractly, the goal is to spit out a set of concrete versions for all the artifacts listed in the arguments, where the versions are all mutually compatible not just in direct dependencies, but also throughout their transitive diamonds, and where any version ranges (or version omissions) are resolved to the most recent possible options.

djspiewak commented 3 years ago

Additionally, if some versions are strictly in mutual conflict (like if I ask for an fs2 version that depends on a cats version which is strictly incompatible with a cats version that I also specified), then we should get an appropriate error.

cosmicexplorer commented 3 years ago

EDIT: I've re-read the prompt, and I think I completely misunderstood the idea. So the below would be a description of a proposal for something that's not at all solving the stated problem of defining a set of constraints with maximum compatibility to ease further upgrades, which would be useful for publishing individual libraries that break things less often, but rather solving a different problem, which is: find a set of constraints which allow me to minimize the number of separate versions of libraries that I depend on across targets in a monorepo. I believe that these problems could be a dual, or more likely this thread could be a subproblem of mine, but the suggestion to e.g. define a mapping of "target name" to its dependency constraints doesn't make sense outside of a monorepo setup. In particular, the problem that this thread is trying to solve doesn't assume we have all of our projects in a polyrepo considered all at once to perform such a resolve upon together. I'll try to further refine this idea to see how closely it hews to the original idea. Sorry for dropping such a long post here before reading more closely, feel free to delete this.


where the versions are all mutually compatible not just in direct dependencies, but also throughout their transitive diamonds

Question on this part. This seems like a strictly more difficult problem than might exist within an actual repo. I'm assuming the goal is to allow a single coursier invocation to provide all these compatible sets of artifacts so e.g. sbt can then assign one (or more) of those compatible sets to each subproject (not sure of the precise nomenclature -- a subproject meaning "a thing that specifies a set of artifacts to resolve, which its source files can then freely import, if the resolve is satisfied". pants calls it a "target").

However, the requirement for fully transitive mutual compatibilities would not seem to be necessary if no subproject actually depends on any such incompatible combination. To attempt to be more clear about this, if I have A resolving artifacts a:1.0.0 and b:2.0.0, and B resolving artifact a:0.9.5, we would not need to calculate whether a:0.9.5 is compatible with b:2.0.0. Now, the current setup you've described would of course handle this case correctly, as it would have to partition into two compatible sets at the start just by having two separate versions of artifact a, which the build tool would process and easily map to the correct targets.

However, one suggestion I might make which mirrors the way I'm solving this for python dependencies at Twitter, is to specialize the input provided to the coursier solve operation, into a mapping of (target name) -> (set of artifact dependency constraints). This relies on the assumption that the build tool is able to provide a unique string (the target name) for each "subproject", as defined above (could also call it the subproject name -- would like to stick with nomenclature most people recognize obviously). This could be provided as JSON to coursier over stdin or a file argument. If provided this mapping, instead of being given a flattened set of dependencies without this context, and attempting to do the harder problem of finding compatible sets that will work assuming the subproject dependency graph is fully connected and acyclic, we could e.g. start by performing a resolve for all of the leaf subprojects (without any dependencies), then "merge" upwards into their parents (the subprojects they depend on). I have some thoughts on what such a "merge" operation would look like.

As compared to the (I believe) overly conservative assumption of subprojects having maximum connectivity (hence requiring compatibility throughout the transitive diamonds), the only assumption made in this proposal is that each subproject must have the exact same version of an artifact as all of its transitive dependencies. In a large monorepo, this obviously becomes a harder problem and would result in more partitionings. If I understand your usage of "polyrepo" correctly, it may be that each repo would correspond to what I'm referring to as a "subproject" here.

The main distinction from the needs of this type of solution for python dependencies is that python projects will very often specify a version range for their dependencies, which is less common (perhaps not currently possible without wider adoption of the version naming tools @eed3si9n mentioned in https://github.com/coursier/coursier/issues/1863#issuecomment-695829676?) in JVM projects. However, I think that just makes the problem easier, although I'm not sure I've got it right as I need to refresh my memory on how dependencies are specified in POMs. EDIT: looks like this is analogous to version requirement specifications in maven.

  1. Am I understanding your statement of the coursier solve use case correctly, and is the reasoning that it may be solving a harder-than-necessary problem clear?
  2. Is this (target name) -> (set of artifact dependency constraints) concept generic enough and simple enough to expect as input to coursier solve?

Thanks for reading, please let me know if this approach is unsuitable for the project for any reason.

cosmicexplorer commented 3 years ago

Ok, given that I just realized the above is a totally different problem, I would like to ask a separate question, which is directly from the prompt: what would we want to do in coursier if instead of the slightly-vague "upgrade as much as possible", we wish to upgrade to a specific version of e.g. http4s, say because it made one function call super fast, and there's no path to doing that which doesn't cause an incompatibility? I assume that coursier solve would fail in this case, but what information would we provide in the error output?

Specifically, would it be useful to try to then solve for the minimum set (or smallest few) of upgrades to other artifacts necessary to accomplish the desired http4s upgrade?

djspiewak commented 3 years ago

I've re-read the prompt, and I think I completely misunderstood the idea. So the below would be a description of a proposal for something that's not at all solving the stated problem of defining a set of constraints with maximum compatibility to ease further upgrades, which would be useful for publishing individual libraries that break things less often

I'll read your comment a little more thoroughly in a second, but just to quickly clarify before my next meeting… The idea is targeted at consumers of libraries, not publishers. And while it does provide additional benefits in polyrepo setups, is certainly is not exclusive to it, since the entire open source universe is (fundamentally) a polyrepo configuration, so even monorepos need to solve this problem when sorting out their dependencies.

Simply put: the problem at hand is finding the correct set of versions of various libraries to marry together in your own build.sbt, subject to some other constraints (Scala version, pinned versions of certain libraries, biasing towards most recent, etc). If you're using a polyrepo layout, then you need to solve this problem not just across open source libraries, but also across your organization. If using a monorepo, then you only need to solve it for external libraries (though your set of constraints will often be much, much larger as a consequence of your larger project surface area). Either way, it's a problem.

djspiewak commented 3 years ago

Ok, given that I just realized the above is a totally different problem, I would like to ask a separate question, which is directly from the prompt: what would we want to do in coursier if instead of the slightly-vague "upgrade as much as possible", we wish to upgrade to a specific version of e.g. http4s, say because it made one function call super fast, and there's no path to doing that which doesn't cause an incompatibility? I assume that coursier solve would fail in this case, but what information would we provide in the error output?

The OP kind of gets at this case a bit with the pinned versions. I gave an example of pinning Cats Effect to specifically 2.2.0 (realistically it probably should have been 2.2.+, but you get the idea). CE 2.2.0 introduced tracing, as well as a number of performance optimizations, so this is a plausible example.

The workflow would be something like this:

  1. Team identifies a desire to adjust artifact A to a given version range (perhaps a specific version). Note that this may actually be a downgrade to avoid some sort of regression.
  2. Person tasked with said upgrade puts the current dependencies (and their constraints, which may themselves be specific versions), including A and its desired version, into coursier solve
  3. The solver says "unsolvable" and lists the conflicting constraints (this is relatively easy to determine algorithmically). For example, perhaps the pinned version of http4s depends on a version of fs2 which depends on Cats Effect 1.x
  4. Team makes a decision about the conflicts, which results in either giving up on the upgrade, or changing one or more of the other artifact constraints
  5. Repeat from step 2

From a tooling standpoint, this actually is remarkably close to how NPM works if you pin (and commit) resolutions (which almost no one does), though I think NPM's resolver is forward-only so it's strictly less general than this problem.

It certainly would be useful to be able to represent artifact version constraints in some sort of file which can be loaded by coursier solve and committed. The libraryDependencies contents would effectively be generated by running coursier solve (and obviously themselves committed). This would be a pretty slick workflow, particularly if tooling like Scala Steward could be made aware of it, but it simply represents a convenience layer on top of what is proposed in the OP. The bar is pretty low here. :-)

where the versions are all mutually compatible not just in direct dependencies, but also throughout their transitive diamonds

Question on this part. This seems like a strictly more difficult problem than might exist within an actual repo.

I think you worked your way around to the answer on this one, but just to clarify… :-)

Resolving transitive diamond conflicts is the hardest problem in this space right now. The example in OP doesn't demonstrate this simply because I included a version for Cats Effect, but if we omitted Cats Effect (which is an upstream dependency of several artifacts in that list), we would very quickly run into interesting problems. For example, we want a working version of Ciris and Http4s, and we want at least version 2.3.x of fs2, which in turn implies that we need at least a certain version of Cats Effect. What version is that? To answer that question right now (like, literally to compose that comment), I literally need to pull up the fs2 github repository and read the build included in the various tags.

Additionally, what version of http4s is compatible with this stuff? What if we had introduced a more complicated series of constraints by upper-bounding one of the dependencies? (not an uncommon situation) For example, if we only allow fs2 versions in the 2.3.x line, that probably means we can't use the latest http4s, so which version can we use? And in turn, what does that mean for other libraries? Imagine for example that we're using cormorant-http4s; what version of that are we allowed to use if http4s itself has been solved to be a particular version, ultimately due to bounding fs2-io? This is why the solver can't just be forward-only, as the current resolver is: it needs to backtrack up to http4s (or even cats effect, in my earlier example) using the bounds inferred via the declared fs2-io bounds, and then in turn work back up into cormorant-http4s, all while looking at Cats Effect and working back upwards to Ciris (though, since we know CE is using semver, that might be a more straightforward problem).

You see how this gets immensely complicated in a hurry. :-) At least to my mind, this is the biggest problem with having a large set of dependencies, even well-maintained ones. It gets even crazier if you're trying to solve this internally to your organization as well as externally. If you have a relatively prolific polyrepo ecosystem, which in turn intersects with the public ecosystem via transitive dependencies, it can be quite challenging to assemble compatible versions even for your own dependencies across teams.

Needless to say, I would expect coursier solve to accept custom repository configurations, just as the other commands do. :-)

At any rate, hopefully this clarifies the use-case a little further.

cosmicexplorer commented 3 years ago

That was remarkably thorough, thank you immensely for your time.

The workflow would be something like this:

Regarding the relationship of this solution to a monorepo setup: I think that it's pretty clear this solution would be directly applicable in a monorepo setup exactly as you've just stated, when employed as a subroutine: coursier solve could be run over one or more targets/subprojects that wish to upgrade one or more of their dependencies. The output over those multiple runs provides the set of changes to other libraries necessary to perform the upgrade for other targets. So I think we need not disturb the problem space further.

It certainly would be useful to be able to represent artifact version constraints in some sort of file

Ah, perfect! That part wasn't clear to me yet. It seems clear that this should be something that can be figured out secondary to the larger solve problem.

To answer that question right now (like, literally to compose that comment), I literally need to pull up the fs2 github repository and read the build included in the various tags.

The need for this is quite clear to me, especially by this comment! While coursier is a lovely codebase, I could even imagine people who are often tasked with this job having written bash scripts or something in an attempt to automate this right now.

(though, since we know CE is using semver, that might be a more straightforward problem)

So if a dependency employs semver, are we just taking advantage of the understanding that we can take any two points in the totally-ordered set of versions for that dependency and represent that as a contiguous version range? That actually sounds like the description maven provides for non-semver versions, so I'm assuming that semver perhaps just reduces the complexity of the search (to not just "everything in lexicographic order")? Especially as those maven docs state "Maven does not consider any semantics implied by that specification", and I'm assuming that transfers to coursier?

It gets even crazier if you're trying to solve this internally to your organization as well as externally.

Yes, it is much more clear to me now the reason to pinpoint that this is for consumers of libraries, not just publishers.

I will think more about this and see whether I can contribute further (I really hope to, I agree that it seems quite a reasonable problem to prototype in under a week maybe). The last thing I will leave you with is instead this completely alternative approach employed to analyze the python dependency ecosystem offline (not as part of the resolver) (blog post from someone else, paper), and the paper links to some interesting related work categorizing conflicts in maven central as well as a more project-specific approach for java projects.

djspiewak commented 3 years ago

So if a dependency employs semver, are we just taking advantage of the understanding that we can take any two points in the totally-ordered set of versions for that dependency and represent that as a contiguous version range? That actually sounds like the description maven provides for non-semver versions, so I'm assuming that semver perhaps just reduces the complexity of the search (to not just "everything in lexicographic order")? Especially as those maven docs state "Maven does not consider any semantics implied by that specification", and I'm assuming that transfers to coursier?

In general, this whole thing is going to have to make assumptions about versioning. Specifically, for a given artifact that has two versions, v1 and v2, can we replace v1 with v2 without affecting the rest of the graph? The answer to that question dictates the solution space, and unfortunately it requires metadata that we simply do not have for general maven artifacts.

The best thing to do, IMO, is pessimistically assume epoch.major.minor versioning, which is relatively common (particularly int he Scala ecosystem), and allow people to declare major.minor.revision if relevant. In future, it might be possible to share this metadata (e.g. some common dependencies, such as Cats and Cats Effect, use semver). Another option if we want to be really computationally intensive would be to use mima internally to check revisions against each other, though that's going to slow down the solution process (as well as introduce an entirely new bit of functionality and infrastructure to Coursier).

cosmicexplorer commented 3 years ago

Thank you for clarifying that point!

Another option if we want to be really computationally intensive would be to use mima internally to check revisions against each other, though that's going to slow down the solution process (as well as introduce an entirely new bit of functionality and infrastructure to Coursier).

My first thought is that I would assume this could be a candidate for actually performing offline (as opposed to within a single coursier run as coursier solve could), especially since it needs access to the actual artifacts instead of just POMs (and I believe also wouldn't allow for range intersections, just one-off pairings for binary compatibility). However we may develop heuristics to speed the process used for coursier solve so much that this could instead be proferred as a second-level check of sorts, under an additional flag, which the user would presumably specify after restricting the range to check a little bit. Or, we may be able to do this right before offering up a successful solution, in order to check our work.

Love the tooling in the scala ecosystem to make these complex verifications so easy to suggest in addition! :D

djspiewak commented 3 years ago

If we open the door to offline computation of metadata (which we would presumably then publish and sync by some mechanism for use in future solve runs), we can likely rely on the transitive property of compatibility to derive intersectionality from a linear set of pairwise comparisons. Unfortunately, mima exceptions are relatively commonplace (for good reason!) because it certainly isn't a tool that is free of false negatives, so I do wonder how feasible even the pairwise comparisons would be in general. We probably do need to rely on versioning assumptions (configurable on a per-artifact basis by the users).

blast-hardcheese commented 3 years ago

This would likely cut down on wasteful scala-steward PRs which currently trigger failing builds, needlessly consuming compute resources for stuff that even if the tests pass, low test coverage could still produce faulty releases.

We'd lose out on the release notes and updates for new versions of libraries we care about, so an enhancement would be to make unsafe PRs either Draft (if draft PRs don't trigger CI builds) or Issues with the current PR description, with an additional note as to which dependencies are preventing the proper update.

This would have a knock-on effect of directing folk to exactly why a shiny new version is unavailable to them, hopefully promoting more community interaction on core infrastructure that may be getting less traction due to poor publicity.

alexklibisz commented 3 years ago

This is a problem that we have some internal sbt tooling to detect and somewhat mitigate, but by no means automatically fix. Basically we have a company-wide plugin that has some wrappers on sbt's internals and, probably the most useful thing, a plugin that provides pinned versions of commonly-used libraries.

Feel free to ignore the tangential question, but I've always wondered if this problem is solve-able at the JVM level? Is there some fundamental reason why the JVM can't load classes from two different versions of a JAR? Obviously something like the proposed solution in coursier is still useful, and it's still bad practice to have bloated artifacts with multiple versions of a library. But it seems not totally crazy that the JVM could load classes from different versions of a library, whereas it seems virtually impossible to get all transitive deps lined up, especially across multiple repos.

eed3si9n commented 3 years ago

Feel free to ignore the tangential question, but I've always wondered if this problem is solve-able at the JVM level? Is there some fundamental reason why the JVM can't load classes from two different versions of a JAR?

It is possible for a JVM to load classes from two different versions of a JAR. sbt compiling Scala 2.13 code even though sbt itself runs on a specific version of Scala 2.12 is a good example of that. sbt constructs a layered ClassLoader, and essentially run Scala compilation within a sandbox, like Inception. This is not trivial since in (functional) programming you typically want a value out of this operation, so you need to come up with a contraption that would survive the sandbox boundary. For sbt / Zinc, we use Java classes to carry the information out. This also means that your reasoning like a type A or org.slf4j.Logger or a singleton object points to the same thing universally within your code no longer holds. You'd have to carefully plan that it's a Logger called by library X1 called by library X2 etc. Same goes for configuration files, side effects etc.

Instead of a traditional function call, a better analogy for this technique is to think of it as an emulation of java -jar ..., like run, which could be useful in some situations, but I'm skeptical it can be a workaround for transitive library resolution issue.

djspiewak commented 3 years ago

Feel free to ignore the tangential question, but I've always wondered if this problem is solve-able at the JVM level? Is there some fundamental reason why the JVM can't load classes from two different versions of a JAR? Obviously something like the proposed solution in coursier is still useful, and it's still bad practice to have bloated artifacts with multiple versions of a library. But it seems not totally crazy that the JVM could load classes from different versions of a library, whereas it seems virtually impossible to get all transitive deps lined up, especially across multiple repos.

It's doable but hard. https://github.com/moditect/layrry is a pretty decent start on this front (works on top of the java module system).

cosmicexplorer commented 3 years ago

Pants uses jarjar (org.pantsbuild:jarjar) to shade libraries of different versions! I believe that is what is being asked here -- I think that it would compose in a lovely way with this solve command to reduce bloat. EDIT: It looks like @eed3si9n has extended it to scalasigs with jarjar abrams (experimental!!), now my favorite package name ever.

cosmicexplorer commented 3 years ago

To rephrase (paraphrasing @eed3si9n from a chat), all shading allows is perhaps some more leeway with allowing some conflicting versions to coincide. I believe this should increase the likelihood of coursier solve being able to find a satisfying set, if it can take advantage of not having to find a completely non-conflicting resolution through all transitive deps. But I don't think it actually "solves this problem at the JVM level" -- coursier solve is the key necessary to make progress towards an upgrade of the relevant dependencies, instead of just working around it.

Separately, I think that build tool support is necessary to make that strategy work, because I believe that jarjar abrams would need to be run by sbt/etc after compiling the project's own source code as well, which seems possibly out of scope for coursier. If this behavior were to be supported by coursier solve at all, it would likely need to be behind a flag at least. Alternatively, we may be able to view coursier solve and shading methods like jarjar abrams as complementary strategies that don't necessarily need to know about each other at all.

Haven't quite fit this all together yet, but I think I'm beginning to see how pants or bazel might extend coursier solve to resolve a minimally-conflicting set of dependencies across an entire monorepo, then apply shading wherever conflicts spring up. If I think a bit more, that could form one implementation of something we're hacking away at inside Twitter right now.