clojars / clojars-web

A community repository for open-source Clojure libraries
https://clojars.org
Eclipse Public License 1.0
470 stars 114 forks source link

Support Maven groupId relocation #801

Closed plexus closed 1 year ago

plexus commented 3 years ago

With the new groupId policy we have started using com.lambdaisland instead of lambdaisland. This is necessary for new artifacts, but we have also started doing this for existing libraries, to keep things consistent for our own internal tooling, and for our users going forward.

Seems Maven actually has a feature to assist with such a scenario known as relocation. For instance, they give the example of ant:ant moving to org.apache.ant:ant. This involves serving up a pom.xml for the old artifact name that looks like this:

<project>
  <modelVersion>4.0.0</modelVersion>
  <groupId>lambdaisland</groupId>
  <artifactId>glogi</artifactId>
  <version>1.0.116</version>
  <distributionManagement>
    <relocation>
      <groupId>com.lambdaisland</groupId>
    </relocation>
    <repository>
      <id>clojars</id>
      <name>Clojars repository</name>
      <url>https://clojars.org/repo</url>
    </repository>
  </distributionManagement>
</project>

I've tried uploading this with

mvn deploy:deploy-file -DpomFile=pom.xml -Dfile=./pom.xml -Dpackaging=pom -DrepositoryId=clojars -Durl=https://clojars.org/repo

But this results in a "no jar was uploaded" error (which is correct, I don't want to upload a jar)

[ERROR] Failed to execute goal org.apache.maven.plugins:maven-deploy-plugin:2.7:deplo
y-file (default-cli) on project glogi: Failed to deploy metadata: Could not transfer 
metadata lambdaisland:glogi/maven-metadata.xml from/to clojars (https://clojars.org/r
epo): Authorization failed for https://clojars.org/repo/lambdaisland/glogi/maven-meta
data.xml 403 Forbidden - no jar file was uploaded -> [Help 1]

Any thoughts on this would be appreciated as I'm also no Maven expert.

danielcompton commented 3 years ago

This seems like a great feature. I've looked at this before and am not sure which tools and repos currently support it, I've never seen it in the wild. I assume Maven does support it though?

plexus commented 3 years ago

I would assume so. Since Clojure CLI does its own resolution it may need patching though. Don't know about leiningen or other aether based tools. I'll try to do some testing.

SevereOverfl0w commented 3 years ago

https://clojure.atlassian.net/browse/TDEPS-8 this is a planned feature for tdeps.

tobias commented 3 years ago

I didn't know relocation was a maven feature, thanks for bringing it up!

Given that not all build tools support relocation and the effort involved, I don't think we should be moving projects fully on Clojars. I do think relocation can be useful as a hint for tooling that supports it.

Given a project foo/bar that has versions 1.0.0 and 2.0.0, and there is a desire to move to com.foo/bar, the project owner could:

  1. push com.foo/bar 2.0.1 that has the same functionality as foo/bar 2.0.0
  2. push foo/bar 2.0.1 that has that same functionality again, but with relocation in the pom that points to com.foo/bar 2.0.1

Then any tooling that doesn't support relocation can still use foo/bar 2.0.1, since it is a real release, and anything that depends on foo/bar 1.0.0 will continue to work. But tooling that does support relocation can find com.foo/bar when depending on foo/bar 2.0.1.

I think the real value of relocation here is for tooling that detects stale dependencies like antq or lein-ancient. They could be updated to follow relocations to find the new name.

We could also pull the relocation info from the pom on Clojars, and link to the new name on the jar and search results pages.

How does that sound?

puredanger commented 3 years ago

https://clojure.atlassian.net/browse/TDEPS-8 this is a planned feature for tdeps.

That's a reported gap, not a planned feature. :) This is a rarely used feature of Maven. If it becomes more common, then I would need to move its importance up for support. I do have a good idea of where and how to support this. In any case, will follow here.

SevereOverfl0w commented 3 years ago

One real danger here is ending up with two copies of the same library on the classpath, under different names. That's one of those "I've lost 3 days here and can't explain why the program works 50% of the time" problems.

Maybe com.foo/bar could depend on a version of foo/bar that has nothing in it? But that's reliant on TDEPS behaviour of picking the highest version, Lein/Mvn works differently so it wouldn't work for that case.

If you change the namespace in the artifact move, then this isn't a problem.

alysbrooks commented 3 years ago

https://clojure.atlassian.net/browse/TDEPS-8 this is a planned feature for tdeps.

That's a reported gap, not a planned feature. :) This is a rarely used feature of Maven. If it becomes more common, then I would need to move its importance up for support. I do have a good idea of where and how to support this. In any case, will follow here.

I'd suggest tools.deps follow Clojars' lead. If Clojars supports relocation, it'll likely become relatively common among Clojure libraries even if it remains rarely used in the larger Java ecosystem.

plexus commented 3 years ago

Thanks @tobias for the proposal.

I don't think we should be moving projects fully on Clojars.

I agree with that, I don't think it's Clojars' job to actually "move" projects, as a maintainer I only want the possibility to point an artifact to a new name, after I deployed it to that new name. The minimum that is needed from Clojars' side for this to work is being able to deploy a pom without the jar.

I understand that probably sounds much easier than it really is, I'm assuming both Clojars internals and UI assume there is always a jar to work with. I also see the point of keeping certain checks in place, like only supporting pom-only deploys if there is a relocation, and the relocation seems valid. But maybe this would be a comparatively small change.

The next level would be some UI indicating the artifact has moved, this is how Maven does it: (example)

2021-05-12_122539_Nightly

Finally you could consider some smarts to help developers, like being able to indicate via the UI that an artifact will be deployed to a different groupId going forward, and have Clojars generate the relocation pom upon deploy. Maybe nice to have but something that could just as well be dealt with by deployment tooling.

push foo/bar 2.0.1 that has that same functionality again, but with relocation in the pom that points to com.foo/bar 2.0.1

This does seem to be what for instance hibernate-validator is doing. They continue to deploy all artifacts twice, but also including a relocation stanza. Others like bouncycastle only deploy the pom with relocation.

Pushing it twice does mean maximal compatibility, but at what cost? As the hibernate docs point out this can cause issues.

[M]ake sure to not depend on HV 5.x and HV 6.x at the same time (as the group ids are different, the dependency resolution algorithm of your build tool fails to detect that these are two versions of the same logical artifact).

I can see it happen quite easily that through transitive dependencies you end up with two artifacts on the classpath with conflicting version. This is what @SevereOverfl0w also pointed out. I think in this case I would prefer for my tools to fail with an error message, than to continue with a possibly inconsistent classpath.

Relocation seems to work fine in Leiningen, but it confuses lein deps :tree, which simply omits the dependency. I'm assuming all Maven or Aether based tools can handle relocation out of the box. tools.deps currently blows up, which actually isn't terrible :) at least it's honest. I'm assuming most dev tooling like lein ancient, depot, CIDER, etc. has no clue and would need to gain explicit support.

My preferred outcome would be for Clojars to gain at least the minimal support of allowing pushing a relocation pom, and tools.deps to learn to deal with that. I think we can also allocate some @gaiwanteam time to help make those things happen, provided someone can provide some initial pointers and be available for questions.

tobias commented 3 years ago

Thanks @plexus for digging in to this a bit more and testing with lein, and thanks @SevereOverfl0w for bringing up the issue with duplicate namespaces on the classpath.

The minimum that is needed from Clojars' side for this to work is being able to deploy a pom without the jar.

This should be a straightforward change - I think this requirement is based on assumptions from the early days of Clojars.

Regarding having the same classes/namespaces in the dependency tree: I had not considered that, and agree that that could be a real problem, especially with transitive dependencies. But I think I have an idea that could avoid that:

Concerns about this approach:

I think the concrete work here is:

How does that sound? Any concerns with that approach? As outlined, it's not a lot of work, and I should be able to do it fairly quickly.

plexus commented 3 years ago

I did a bit more testing by just inserting some dummy poms/jars in ~/.m2/repository

Base case

foo/bar has a relocation stanza to com.foo/bar. Behavior is as previously established.

Leiningen: works as expected.

tools.deps: Error building classpath. Could not find artifact foo:bar:jar:1.0.0 in central (https://repo1.maven.org/maven2/)

Adding an empty jar

Created an empty jar (i.e. a zip file with nothing in it), and put that in .m2/repository/foo/bar/1.0.0/bar-1.0.0.jar. Behavior is still basically what you would expect.

leiningen: ignores the empty jar, only puts com.foo/bar on the classpath

tools.deps: will start now, but only with foo/bar (the empty jar) on the classpath, not com.foo/bar.

Adding the dependency

In the foo/bar pom, add a dependency on com.foo/bar, on top of the relocation (as @tobias suggests).

Here is gets interesting: the behavior is exactly the same as with the previous case. Both lein and tools.deps ignore the dependencies when there is a relocation stanza.

Which is not what I expected, I assumed since tools.deps ignores the relocation stanza it would just follow the dependencies, but it seems that's not the case, so adding both is not a solution for Clojure CLI compat.

Maybe @puredanger has an idea of why it's like that, even though tools.deps doesn't honor the relocation stanza, it seems to still shadow the dependencies.

Conclusion

I like the idea of adding the dependency on top of the relocation for maximal compatibility, although at least for the most common tools it doesn't seem to make a difference.

The rest of the plan all sounds good.


The final pom:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemalocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>foo</groupId>
  <artifactId>bar</artifactId>
  <version>1.0.0</version>
  <dependencies>
    <dependency>
      <groupId>com.foo</groupId>
      <artifactId>bar</artifactId>
      <version>1.0.0</version>
    </dependency>
  </dependencies>
  <distributionManagement>
    <relocation>
      <groupId>com.foo</groupId>
    </relocation>
  </distributionManagement>
</project>
puredanger commented 3 years ago

I guess I outlined this in a slack thread yesterday but didn't put it here, but this is what I would expect. Note that tools.deps does not look at the pom, it uses the Maven APIs. There are two calls into Maven:

borkdude commented 2 years ago

Just letting you know that I'm also interested in this feature. A group relocation of one of my libs caused several issues because of two versions of the same lib on the classpath.

kamilwaheed commented 2 years ago

I'm currently looking into this after chatting with @plexus separately. We've concluded that we'll start with updating deploy validation on clojars so we can upload poms w/o jars and at least have tools like lein to follow relocations.

I've played around with clojars-web and I can say that simply omitting assert-jar-uploaded from clojars.routes.repo/validate-deploy is sufficient for the simplest requirement of allowing poms w/o jars.

However, that's probably easing the validation too much. We need stricter validation. As @tobias has suggested above:

Add a deploy validation for requiring the relocation as a dependency

Sounds like a good idea. Also, would it make sense to:

  1. Add validation for skipping jar only in the case of relocation
  2. Add validation for relocation to be a valid library

I'm happy to open a PR easing the validation for having a jar so long as the above requirements (some or all) are met. Does this make sense?

tobias commented 2 years ago

Given the tooling issues with relocations based on @plexus' research, what if the recommended approach was to not use a relocation stanza, but instead:

That won't prevent bar from being on the classpath in some cases (for example, if foo/bar 1.0.0 and com.foo/bar 1.0.1 are in the dependency tree), but I'm not sure how that can be avoided unless all tooling supports relocation, and we rewrite older poms to include a relocation stanza, which would be difficult.

We could also support a format for the description field that could be parsed to point to the new name (example: RELOCATED: use com.foo/bar instead). If the description starts with RELOCATED:, we could parse the description on the server-side, look for com.foo/bar and link it to the new name.

Do folks see issues with that approach?

@kamilwaheed I think your work is a great start!

I've played around with clojars-web and I can say that simply omitting assert-jar-uploaded from clojars.routes.repo/validate-deploy is sufficient for the simplest requirement of allowing poms w/o jars.

I think this is enough, I don't think we need to replace it with any other validation. Having a pom-only release is perfectly fine in other maven-based systems, and they are useful beyond this case (for example, for umbrella projects like https://clojars.org/ring/ring or https://clojars.org/org.immutant/immutant, where the project itself provides no code, just dependencies). I think we have the jar requirement just for historical reasons from Clojars' early days, and I'm fine with getting rid of it.

plexus commented 2 years ago

I'm looking at some more examples, while many indeed use a pom-only approach, I'm also seeing several that upload a pom + a jar, with the jar only containing the pom.xml, pom.properties, and MANIFEST.MF. (mapstruct-jdk8, hibernate-validator). So that's interesting, that's something we can already do today. If we want to go down the path of using relocation, then the more urgent target would be tools.deps.

Reading the maven relocation docs again I did come across this part, which I think I did not fully appreciate before:

Create a minimal Maven 2 POM file for every old release of foo in your Maven 2 repository. The POM files only need to include groupId, artifactId, version and the relocation section. Note: Before you replace your old POM files in /bar/foo/ with these minimal POM files, make sure you have made backups!

So indeed they do recommend overwriting old poms, which makes me queasy, I'm not sure at all that Clojars should allow this. The best I can imagine is a limited feature where you can mark a library within clojars as being relocated, and clojars will generate the minimal relocation pom if a pom for an older version is requested. But that does seem a lot more complex. And old poms could still be cached locally.

If we can't do something like that to ensure that for every version there's a relocation stanza, then I think there's not a lot of benefit to using relocation over @tobias's proposal of uploading single-dependency libraries at the old coords that pull in the new version. And that would mean existing tooling doesn't need adjusting.

Ideally I'd like tooling like Antq or Depot to also switch over to the new group-id when upgrading dependencies, but as long as we keep releasing both old and new names (with one being an "empty" jar + dependency to the new one), then it shouldn't cause many issues that they don't. It would be good if we could come up with a way in the pom to still signal the relocation, in case tools do want to handle the rename. It also would be good if we could have Clojars show a big warning box on the old version pointing to the new one.

It's not going to be a perfect solution, there will be user confusion and errors during the transition, but I think this does give us something we can start doing that is an improvement over what we're currently doing.

In that case I think next steps would be

Or we pursue full relocation handling

tobias commented 2 years ago

So indeed they do recommend overwriting old poms, which makes me queasy, I'm not sure at all that Clojars should allow this. The best I can imagine is a limited feature where you can mark a library within clojars as being relocated, and clojars will generate the minimal relocation pom if a pom for an older version is requested. But that does seem a lot more complex. And old poms could still be cached locally.

If we did have Clojars rewrite POMs, we would have to do it for all versions at the time the relocation was specified, since artifacts served via the repo don't pass through the Clojars app; they are served via Fastly & S3. And you are correct - we have to worry about local caches, as well as the Fastly cache (which we can purge) and any mirrors (which we can't force to update). We also can't rewrite POMs for any GPG-signed releases.

Ideally I'd like tooling like Antq or Depot to also switch over to the new group-id when upgrading dependencies, but as long as we keep releasing both old and new names (with one being an "empty" jar + dependency to the new one), then it shouldn't cause many issues that they don't. It would be good if we could come up with a way in the pom to still signal the relocation, in case tools do want to handle the rename. It also would be good if we could have Clojars show a big warning box on the old version pointing to the new one.

It's not going to be a perfect solution, there will be user confusion and errors during the transition, but I think this does give us something we can start doing that is an improvement over what we're currently doing.

In that case I think next steps would be

* Figure out how to signal relocation to clojars clojars UI

  * the description field hack is an option, can it be a first-class field in the UI/API?
  * this would also still leave the door open to using this later for actual relocation support
* Make the move visible in the UI

I wish we had other alternatives (for example, I wish POMs supported arbitrary properties that we could use for this). But, as you say, (ab)using the description field for this now doesn't prevent a better solution in the future.

We could parse the description field then note the relocation:

If we come up with a standard for the description field, Antq or Depot could follow that. We could package up relocation detection into a small lib for them to use. Or they could call the Clojars API to get that information.

* pom-only releases are valuable in their own right (see also tools.deps working on BOM support), so I think it's fine if @kamilwaheed continue with that

Agreed, I'd like to see this implemented regardless.

Or we pursue full relocation handling

* Figure out how to signal relocation to clojars clojars UI

* Let clojars handle the relocation poms for old and new releases, new releases are only uploaded to the new coords

* Handle relocation in tools.deps (this gets us the main tooling support that I personally care about: lein, clojure cli, maven)

* Make the move visible in the UI

I don't think we'll ever be able to implement full relocation support, since we can't rewrite older signed POMs, and I would prefer to keep Clojars immutable. That means we would only have partial relocation support for new versions, which doesn't solve the issue with classpath collisions.

My personal recommendation would be for folks to not rename/relocate existing libraries, but I realize that I don't maintain any real libraries any longer, so don't feel the pain of having two group names.

kamilwaheed commented 2 years ago

I don't think we'll ever be able to implement full relocation support, since we can't rewrite older signed POMs, and I would prefer to keep Clojars immutable. That means we would only have partial relocation support for new versions, which doesn't solve the issue with classpath collisions.

Correct me if I'm wrong here, but based on my testing (lein deploy, mvn deploy:deploy-file as well as CLJ cemerick.pomegranate.aether/deploy), tools typically don't rewrite older POMs/JARs but create a new build with an incrementing build number. Local/Fastly caches still matter of course but other than that, you should still be able to follow relocations on older versions.

I wish we had other alternatives (for example, I wish POMs supported arbitrary properties that we could use for this). But, as you say, (ab)using the description field for this now doesn't prevent a better solution in the future.

We could parse the description field then note the relocation:

  • in the API
  • on the jar page with a banner
  • on the search results cards

This appears to me as a minor improvement over showing description as plain text that contains the relocation notification from the author.

Do we then also enforce presence of the relocated coordinates in dependencies and match them again the relocation notification in description?

plexus commented 2 years ago

Tools typically don't rewrite older POMs/JARs but create a new build with an incrementing build number.

If you look at the Maven docs for relocation they recommend physically moving all old artifacts to their new group-id, and then putting poms with relaction stanzas where the old artifacts used to be. Once that has happened you are expected to continue deploying a pom with the old group-id along with every new release, which contains a relocation stanza pointing to the new group-id. This last part deploy tools can handle, but the original move would be handled manually by a server admin.

This is the only way that you will get correct resolution. If Maven pulls the pom for old-group/artifact "1.0.0", and new-group/artifact "2.0.0", and the first one does not contain a relocation stanza to the second, then it will consider them separate artifacts and put both on the classpath.

My personal recommendation would be for folks to not rename/relocate existing libraries, but I realize that I don't maintain any real libraries any longer, so don't feel the pain of having two group names.

We are trying to make the best of a situation we were forced into. Turns out that, in conclusion, "the best" is more or less what we already have, since there is no feasible technical solution that will not cause more issues for users. So yes, given that insight we will likely continue to release older projects under the lambdaisland group, and not move them to com.lambdaisland. I think for the few that we already prematurely renamed we will update our build tooling to release under both names, with the lambdaisland one effectively a "redirect" via a dependency to the new name.

weavejester commented 1 year ago

I've also run into this problem. I've released some existing libraries under the new artifact group system, using a verified domain name, without considering how this affected the classpath and existing dependencies.

I'm not certain what the best solution is to this, now that I have released libraries under the new group IDs. I can add a dependency from com.example/example to example/example, or vice versa, but either way the dependency depth is increased by one, and this affects which dependency gets priority if there are any conflicts in the dependency tree.

Is there any ideal solution to this, or is this an issue that's not meaningfully resolvable now that I've made the mistake of changing the group ID?

borkdude commented 1 year ago

I've also had this problem and regretted moving a lib (borkdude/sci -> org.babashka/sci) since this caused issues with both old and new versions on the classpath. I managed to track down most old usages via https://grep.app and made PRs for those, but it would definitely be nice to have this handled by clojars. Thanks!

weavejester commented 1 year ago

I tested out using Maven's hard dependency to see if this could be a possible solution, as one would imagine this would allow com.example/example and example/example to be bound together an an atomic unit. For example [com.example/example "0.1.0"] would have a hard dependency [example/example "[0.1.0]"].

Unfortunately Maven's dependency resolution logic is completely insane and this doesn't work -- and even if it did, tools.deps uses a different (albeit saner) dependency resolution algorithm entirely that ignores the hard dependency syntax.

It may be worth adding to the FAQ that moving artifact groups is generally a bad idea for packages hosted on Clojars. We have no mechanism that can handle this in a way that doesn't produce issues and no real way of resolving it after the package has been deployed. I'd feel like this was just me being dumb, but as plexus and borkdude also fell for this trap, it may not be as obvious as it seems in hindsight.