eclipse-platform / eclipse.platform.releng

3 stars 22 forks source link

Fix dependency/groupId computation when publishing to Maven #50

Closed mickaelistria closed 2 years ago

mickaelistria commented 2 years ago

In 4.24, some artifacts are now consumed directly from Maven and already have groupId/artifactId to be used when describing them as dependencies when generating pom files to publish to Maven Central. The current https://github.com/eclipse-platform/eclipse.platform.releng/blob/master/publish-to-maven-central/SDK4Mvn.aggr defines static mapping rules that used to be reliable in the time of Orbit, but aren't any more. See https://github.com/eclipse-equinox/equinox.bundles/issues/54 / https://github.com/eclipse-equinox/equinox.framework/issues/70 for an instance of this problem. The pom generator should be fixed to use the reference groupId/artifactId when listing dependencies as much as possible instead of hardcoding rules.

mickaelistria commented 2 years ago

Fortunately, Tycho now reliably lists the original groupId/artifactId/version in its metadata when artifacts come from Maven, so we probably won't need further mapping rules. So I'm thinking that we could decouple things a bit and remove some extra layers to simplify things and leverage latest Tycho/p2 capabilities:

Overall, it might be possible in the future to have this happening automatically while building the artifacts, as part of the build, so we'd save the need to generate the pom at that stage and could just consume something already existing; but that would be some next steps. Decoupling as suggested above will help towards that goal.

laeubi commented 2 years ago

I don't know much about the aggregator stuff, but should we not invest the effort directly at the projects? I think it would even be feasible to simply don't publish the aggregator stuff anymore to maven, even if we might miss a release to be available on maven then it might give the required attentions to the projects to work on that topic. I fear as long as there is "someone else to do it" there is not much attraction for the projects to do it otherwise.

mickaelistria commented 2 years ago

On Wednesday, June 15, 2022, Christoph Läubrich @.***> wrote:

I don't know much about the aggregator stuff, but should we not invest the effort directly at the projects?

The aggregator is an extra layer which brings extra requirements and expertise over upstream (p2). If we remove this layer, we will probably allow the pom generation to work in more contexts (eg Tycho). IMO, we're in a case of "less is more" here.

-- Mickael Istria Eclipse IDE https://www.eclipse.org/eclipseide developer, for Red Hat Developers https://developers.redhat.com/

akurtakov commented 2 years ago

I think it's not clear what does "aggregator" means here, it's highly overloaded term . In this case it's https://github.com/eclipse-platform/eclipse.platform.releng/blob/master/publish-to-maven-central/SDK4Mvn.aggr which is purely Eclipse TLP thing . Skipping a release will have effects like delaying many projects to use latest java compiler(jdt.core) improvements, tycho to miss p2 fixes(if any) and probably many other things I can't think of right now. As we speak about Eclipse TLP content here I would dare to say it's us being responsible to publish to Maven Central and we never had the luxury of someone else does it. Much like with direct Maven dependency usage - we do not wait on someone else to put it in Orbit but do the necessary steps to do it proper, just like we would have to lead by example in this case.

mickaelistria commented 2 years ago

We may also use directly Equinox instead of p2 to compute dependencies, it should usually be more reliable and code would be even more portable, even to Felix.

HannesWell commented 2 years ago

It is a good plan to make the generation of pom's for publication on Maven-Central more reliable and more 'native' without requiring an extra aggregator-layer that needs manual care. Generating the deployed-poms (to enhance it with explicit information that are implicitly known at build-time) is also something that is planned for Maven-4 and 5 so maybe we get (better) support for that from the Maven side as well.

However if the current approach is used ever again I suggest to implement a sanity-check if the generated dependency really exists at https://repo1.maven.org/maven2/. At least for those dependencies that are not part of the build it should be relatively easy to check if https://repo1.maven.org/maven2/<groudId-elements>/<artifactId>/<version> exists (i.e. not returns a 404) and then fail the build if it is missing. At least we then don't publish broken poms anymore. Ideally we run the check more often to not only recognize missing mappings within the release process.

jmini commented 2 years ago

At least for those dependencies that are not part of the build it should be relatively easy to check

This is exactly what I do in my project when I compute the BOM file corresponding to an Eclipse release:

https://github.com/jmini/ecentral/blob/78ea65897803b8f8e686270c625b998982854306/src/main/java/fr/jmini/utils/ecentral/ECentralTask.java#L262-L272


I did some health check on the published artifact from the 2022.06 release (groupIds org.eclipse.platform, org.eclipse.jdt, org.eclipse.pde -- because I think this is what is published on maven central with the SDK4Mvn.aggr script), and found #52.

akurtakov commented 2 years ago

@jmini do you think you can spend some time to put your check into https://github.com/eclipse-platform/eclipse.platform.releng/tree/master/publish-to-maven-central ? Even though we are looking for better approach, having a sanity check is always welcome. I don't know the dirty details but if you're willing to give it a try @sravanlakkimsetti should be able to answer questions.

jmini commented 2 years ago

Sure I can give it a try, but I am not aware how the publish process looks like.

For me at some point the jar and the poms "ready for publication" are stored somewhere. And at this point in time, we could make maven verify that they are correct (point to existing artifacts).

merks commented 2 years ago

Note that I could look at doing such a check in the aggregator, but if you plan to re-implement everything it provides anyway, that would be kind of a waste of time...

mickaelistria commented 2 years ago

Please open a separate issue to track discussion and actions about checks/verification.

jmini commented 2 years ago

From @mickaelistria:

Overall, it might be possible in the future to have this happening automatically while building the artifacts, as part of the build, so we'd save the need to generate the pom at that stage and could just consume something already existing; but that would be some next steps. Decoupling as suggested above will help towards that goal.

As long as your solution produce jars and poms that do not need tycho to be consumed, this is fine. Because this is the real value of this aggregator approach. Being able to be consumed by "the rest" of the java world (even if they could use tycho, but this is an other story).


And please also noticed that the coordinates in the projects are not the same than the published one.

Example org.eclipse.core.jobs (selected randomly):

Coordinates in the project (source)

  <groupId>org.eclipse.core</groupId>
  <artifactId>org.eclipse.core.jobs</artifactId>
  <version>3.13.100-SNAPSHOT</version>

Coordinates of the published artifact in maven central (created by this aggregator):

  <groupId>org.eclipse.platform</groupId>
  <artifactId>org.eclipse.core.jobs</artifactId>
  <version>3.13.0</version>

(not saying that this has to stay forever like this, just telling how this currently done)

sravanlakkimsetti commented 2 years ago

Sure I can give it a try, but I am not aware how the publish process looks like.

For me at some point the jar and the poms "ready for publication" are stored somewhere. And at this point in time, we could make maven verify that they are correct (point to existing artifacts).

https://ci.eclipse.org/releng/view/Publish%20to%20Maven%20Central/job/CBIaggregator/ creates jars and poms in "ready for publication" state. you can see them in the built artifacts of the job run for example https://ci.eclipse.org/releng/view/Publish%20to%20Maven%20Central/job/CBIaggregator/125/artifact/repo-125/

HannesWell commented 2 years ago

We may also use directly Equinox instead of p2 to compute dependencies, it should usually be more reliable and code would be even more portable, even to Felix.

Do you mean by reliable that it also considers dependencies due to Import-Package? Because at the moment it looks like only Require-Bundle dependencies are considered. For example the published pom of org.eclipse.equinox.log.stream lists no dependencies while the plug-in has imported packages: https://repo1.maven.org/maven2/org/eclipse/platform/org.eclipse.equinox.log.stream/1.1.0/org.eclipse.equinox.log.stream-1.1.0.pom

As a side-note the issueManagement element and connection are outdated as well.

laeubi commented 2 years ago

We may also use directly Equinox instead of p2 to compute dependencies, it should usually be more reliable and code would be even more portable, even to Felix.

I really don't get why we should put that much effort here if everything is already implemented in Tycho already for a solution that more should be an intermediate than becoming a constant source of development effort. Why not improving Tycho as it is currently quite good at these (just use dependency:list and Tycho will tell you everything you need here) and even make the projects publish suitable pom data in the first place?

laeubi commented 2 years ago

Skipping a release will have effects like delaying many projects to use latest java compiler(jdt.core) improvements, tycho to miss p2 fixes(if any) and probably many other things I can't think of right now.

Yes and finally people will realize that they rely on a best-effort service and probably would start to fix the project itself instead of using this workaround forever. There is nearly no excuse today as large effort where already put into Tycho to support generation and publishing of suitable pom data right now, it is just that no one uses it because people are either not aware or whatever of these features and just pass away for the smallest inconvenience instead of reporting good bug report and/or improving the code as long as we offer them a (for them) comfortable solution.

So my "vote" her would be, announce now that we no longer will publish bundles through this script anymore after the 2022-09 release and that anyone requiring it (including Tycho of course!) has to start right now to take the required actions to make these projects capable of publishing their own maven metadata!

HannesWell commented 2 years ago

So my "vote" her would be, announce now that we no longer will publish bundles through this script anymore after the 2022-09 release and that anyone requiring it (including Tycho of course!) has to start right now to take the required actions to make these projects capable of publishing their own maven metadata!

Can you outline the required steps already?

And aren't the projects published by this script 'only' the Eclipse-TLPs (i.e. eclipse-platform, Equinox, JDT and PDE)? At least only they occur in this repo's publish-to-maven-central folder? If this affects more projects we could start with them, which should be a demonstrator for other projects.

laeubi commented 2 years ago

Can you outline the required steps already?

  1. Use Tycho 2.7.3 or higher (3.0.0-SNAPSHOT recommended because of step 5. and 6.)
  2. Enable <addMavenDescriptor>true</addMavenDescriptor> in the tycho-packaging:package-plugin
  3. Inspect the generated pom inside the bundle-jar (or keep the consumer pom)
  4. Configure the Maven Flatten Plugin to clean / enhance your pom as you desire to get better result
  5. Report any obstacles, issues or enhancements to Tycho so we can improve here or even help with a PR already.
  6. Optionaly: Use Tycho-CI-Firendly-Versions to allow for a simple mvn deploy

And aren't the projects published by this script 'only' the Eclipse-TLPs

Yes, so only our issue here, no one else to blame to not start right now... I'm currently trying to get Equinox ready for this, but any help would be appreciated of course.

mickaelistria commented 2 years ago

We do need to publish Maven artifacts for the continued success of the project. The ecosystem is too big and important to suddenly break it and the ROI is good enough to justify Eclipse project publishing the jars, moreover there are allocated resources assigned to keeping it working. Moving responsibility to consumers (eg Tycho) would potentially result in duplicated work and incinsistencies, and wouldn't bring more workforce over what's available in Platform (all active committers on Tycho are already active committers on Platform). Overall, I see no solid rationale to stop doing it and see only drawbacks.

The question is not whether to publish or not, but instead how to do it better and better.

tjwatson commented 2 years ago

I have to say that I am confused by some of the discussion here. Is anyone seriously suggesting that the Eclipse TLP should not publish its own artifacts to maven central? I 100% agree with @mickaelistria last comment that we must continue to publish our JARs as maven artifacts for continued success.

HannesWell commented 2 years ago

The question is not whether to publish or not, but instead how to do it better and better.

Second that. We should not stop publishing to Maven-Central but we should start to work on an improved workflow. Maybe we can complete the work and already use that for the next release or maybe we have to harden the existing workflow for one more release.

laeubi commented 2 years ago

I have to say that I am confused by some of the discussion here. Is anyone seriously suggesting that the Eclipse TLP should not publish its own artifacts to maven central?

No one is saying that, nor that other projects should publish anything, but we should stop doing it as an extra step by trying to restore the information from an aggregated P2 repository content and instead the originating project should publish it to maven-central at the first place! Because at that point there are all information present, there is no need to guess anything, e.g. the recent issue would simply not occur, as tycho would have known the exact GAV of that dependency.

In contrast, trying to deduce this information after the fact is cumbersome and error prone as just proven right now. Any more effort in this fragile solution should better be placed to enable the TLPs to do it right now, as @akurtakov explained this solution was from a time where we just have had completely different background and the goal was to just bring at least some limited set of artifacts to maven.

stephan-herrmann commented 2 years ago

... the originating project should publish it to maven-central at the first place!

Just keep in mind that publishing is more than creating mavenized artifacts. Both steps can possibly be done under different responsibility (centrally vs. decentralized).

... tycho would have known the exact GAV of that dependency

While still some orbit bundles are consumed, does tycho know about their "original" GAV? How?

Does tycho have a solution for mavenizing (platform dependent) OSGi fragments?

Another observation: when the workflow for publishing to Maven Central was created, the feedback from other projects (outside Eclipse TLP) was near zero, i.e., there was not much enthusiasm about joining the party. Maybe this situation has changed in the meanwhile?

mickaelistria commented 2 years ago

While still some orbit bundles are consumed, does tycho know about their "original" GAV? How?

I don't think it has original GAV for Orbit bundles. After all, they're not Maven artifacts anymore, so even having them wouldn't help as the Orbit artifact is not the Maven artifact.

Does tycho have a solution for mavenizing (platform dependent) OSGi fragments?

It already has some good bricks to consolidate this story (mvn dependency:list now does a great job at referencing Maven artifacts), but no complete solution comparable to what aggregator does at the moment.

Maybe this situation has changed in the meanwhile?

I think that the recent improvements in Tycho make it more and more accessible and appealing to get "more Maven" now.

merks commented 2 years ago

EMF publishes to Maven as well, though I can't say it's with enthusiasm:

https://ci.eclipse.org/emf/job/maven-publish/

Note that direct-from-maven artifacts have this kind of information:

image

And even ones from Orbit have stuff like that:

image

stephan-herrmann commented 2 years ago

And even ones from Orbit have stuff like that:

... where a maven-groupid of org.eclipse.orbit.bundles should definitely never be seen in any pom :)

HannesWell commented 2 years ago

It already has some good bricks to consolidate this story (mvn dependency:list now does a great job at referencing Maven artifacts), but no complete solution comparable to what aggregator does at the moment.

What just came into my mind (I don't know if this is already used), but the Maven-Central search API allows to search for the artifact by its jar's checksum: https://central.sonatype.org/search/rest-api-guide/ So given that the exact (per bit) identical jars are published to Maven-Central and to the Eclipse release p2 repositories that checksum can be used to find the GAV of a dependency even if it does not have any metadata embedded.

Alternatively it can still be attempted to search by fully-qualified classname. Given that nobody else has re-published the artifacts and taking the version into account this could also lead to a unique result (maybe multiple or even all classes from that dependency have to be used) to get a singleton union. However this has to be done with care to not open the door for supply chain attack where somebody else publishes a malicious artifact that is falsely selected by such algorithm.

laeubi commented 2 years ago

While still some orbit bundles are consumed, does tycho know about their "original" GAV? How?

As these are not maven dependencies tycho cant know the gav but no one esle could ever provide one, so they are just "hidden" to the maven world, simple solution (without guessing too much, or even query external servers), just use the now available maven-target locations and everything will work like magic :-) (ed has marked the relevant parts but not the "magic" part that is the maven-repository property Tycho is looking for to decide if this is a 'real' maven artifact from a maven repository.

Does tycho have a solution for mavenizing (platform dependent) OSGi fragments?

Not sure what exactly you mean, but also platform dependent OSGi fragments are just maven artifacts and can be deployed, so there is nothing to do here, it is just that maven has no mean to reference them in a comfortable way, there are currently two ways to solve this:

  1. Consumers of such bundles has to add some extra stuff in their (consumer) pom see for example profiles
  2. We ask maven to add support for "filtered dependencies" so one can add "conditional" dependencies like one can have conditional profiles
mickaelistria commented 2 years ago

A coarse grain strategy, which is an evolution of what is currently done more than a revolution. would be:

laeubi commented 2 years ago

... where a maven-groupid of org.eclipse.orbit.bundles should definitely never be seen in any pom :)

Funny enough the group id actually exists:

https://mvnrepository.com/artifact/org.eclipse.orbit.bundles

Nerveless the com.google.guava / 30.1.0.v20210127-2300 delivers the following in their META-INF/maven/.../pom.xml

what is not deployed to central and the version is obviously wrong and the artifact id seems a bit too broad... so what I have learned from my investigations in Tycho, the Maven infos of Eclipse bundles in P2 and in the artifact itself are too often wrong to say one could "just use it"...

<?xml version="1.0" encoding="UTF-8"?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
  <modelVersion>4.0.0</modelVersion>
  <parent>
    <groupId>org.eclipse.orbit.bundles</groupId>
    <artifactId>google</artifactId>
    <version>1.0.0-SNAPSHOT</version>
  </parent>
  <artifactId>com.google.guava</artifactId>
  <version>30.1.0-SNAPSHOT</version>
  <packaging>eclipse-bundle-recipe</packaging>
  <name>Guava: Google Core Libraries for Java</name>
  <dependencies>
    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>guava</artifactId>
      <version>30.1-jre</version>
    </dependency>
    <dependency>
      <groupId>com.google.guava</groupId>
      <artifactId>failureaccess</artifactId>
      <version>1.0.1</version>
    </dependency>
  </dependencies>
</project>
mickaelistria commented 2 years ago

Here is a script that could be as a basis to replace the aggregator some time later

pushd repo
# download
# unzip
repo=$(pwd)
pushd plugins
mkdir .mvn
echo '<?xml version="1.0" encoding="UTF-8"?>
<extensions>
  <extension>
    <groupId>org.eclipse.tycho</groupId>
    <artifactId>tycho-build</artifactId>
    <version>2.7.2</version>
  </extension>
</extensions>
' > .mvn/extensions.xml
echo "<project xmlns='http://maven.apache.org/POM/4.0.0' xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd'>
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.eclipse.platform</groupId>
    <artifactId>org.eclipse.toPom</artifactId>
    <version>1.0.0-SNAPSHOT</version>
    <packaging>pom</packaging>
    <build>
        <plugins>
            <plugin>
                <groupId>org.eclipse.tycho</groupId>
                <artifactId>tycho-maven-plugin</artifactId>
                <version>2.7.2</version>
                <extensions>true</extensions>
            </plugin>
        </plugins>
    </build>
    <repositories>
        <repository>
            <id>this</id>
            <layout>p2</layout>
            <url>file:${repo}</url>
        </repository>
    </repositories>
</project>
" > pom.xml
for plugin in $(ls org.eclipse.*.jar); do
    folderName="$(basename "$plugin" .jar)"
    mkdir -p $folderName/META-INF
    unzip -p $plugin META-INF/MANIFEST.MF > $folderName/META-INF/MANIFEST.MF
    pushd $folderName
    mvn dependency:list
    popd
done
popd

One issue is that at the moment mvn depdency:tree isn't able to build a real tree for p2 artifacts and list transitive deps at same level as main deps. If we want it to be usable for Maven, we'll need to get the top-level deps only.

mickaelistria commented 2 years ago

I'm making decent progress here and have some conclusions to share:

and some questions do remain:

laeubi commented 2 years ago
  • Is it OK for Platform to publish those artifacts under this namespace.

Why not let orbit deploy them as part of the release?

  • which adds OSGi info

Why not invest the time to add them in the "real" source at the first time, yes I know some are hard, but obviously not all ...

  • Some artifacts are even older and we need to define mapping rules at the moment.

Why not omit them at all? Or replace by newer variants (see above) with maven?

mickaelistria commented 2 years ago

Why not let orbit deploy them as part of the release?

Indeed, that would be best. But "let" means more "require" here, and I don't think Orbit is a lively project enough to fulfill requirements coming from downstream.

Why not invest the time to add them in the "real" source at the first time, yes I know some are hard, but obviously not all ...

Currently, I don't think anyone has enough time to invest in that to expect it can be completed for September release

Why not omit them at all? Or replace by newer variants (see above) with maven?

For some artifacts (eg org.w3c.dom.events), there is no newer variants with maven.

laeubi commented 2 years ago

But "let" means more "require" here, and I don't think Orbit is a lively project enough to fulfill requirements coming from downstream.

At least some stuff is actually deployed to maven: https://mvnrepository.com/artifact/org.eclipse.orbit.bundles?sort=newest so there seems to be a way already...

Currently, I don't think anyone has enough time to invest in that to expect it can be completed for September release

Maybe because we are too busy?

mickaelistria commented 2 years ago

At least some stuff is actually deployed to maven: https://mvnrepository.com/artifact/org.eclipse.orbit.bundles?sort=newest so there seems to be a way already...

Right, and we're going to adopt those.

Maybe because we are too busy?

Yes, We're paying back technical debt. But it's not clear here that changing upstream artifacts right now will lead to a faster/better outcome here. The aggregator/publisher needs to embrace Maven artifacts better first for this to be helpful. Having only upstream artifacts is more the final goal actually. However, if you or some other want to work on OSGi-fying some artifacts that currently aren't so we can replace Orbit's ones, it would be welcome.

mickaelistria commented 2 years ago

I think this is now fixed by adoption the newer aggregator builds (which are capable of better computing Maven coordinates) and continuous work to