eclipse-archived / ceylon-herd

The Ceylon repository web application
Apache License 2.0
21 stars 11 forks source link

Interoperability with Ivy and Maven dependency resolvers #262

Open ckulenkampff opened 8 years ago

ckulenkampff commented 8 years ago

This enhancement would allow to create a flat classpath of Ceylon CARs for Java projects using Maven, Ivy or Gradle. This is possible by offering appropriate repository "facades" through the Herd repository server.

Maven repository structure Maven expects the following layout (see Maven Repository Layout - Final) for primary artifacts: /$groupId[0]/../${groupId[n]/$artifactId/$version/$artifactId-$version.$extension and for secondary artifacts: /$groupId[0]/../$groupId[n]/$artifactId/$version/$artifactId-$version-$classifier.$extension

Ivy repository structure Ivy is more flexible and allows to specify custom patterns for artifact resolution (see Ivy Documentation - Main Concepts). The default patten that is used by Gradle is the following (see Gradle DSL Reference - IvyArtifactRepository): Artifacts: $baseUri/[organisation]/[module]/[revision]/[type]s/[artifact](.[ext]) Ivy module descriptors: $baseUri/[organisation]/[module]/[revision]/[type]s/[artifact](.[ext])

Meta information To resolve transitive dependencies both repository types require meta information. Maven uses pom.xmls. Ivy uses ivy.xmls, but can also process pom.xmls. Those files must be accessible via HTTP requests.

Meta information augmentation When the repository server responds to a "foreign" meta data request for a Ceylon module, it should automatically add all implicit dependencies of the Ceylon language to the response. For interoperability these Ceylon language modules should be published to the Herd repository so that Java projects that depend on a Ceylon library do not have to provide them by themselves.

Artifact aliases For interoperability it would be very useful when CAR files are also available under the same name but with JAR file extension when accessed through a facade.

Many IDEs automatically link source and javadoc JARs to the downloaded artifacts by searching in the module cache for files like $artifactId-$version-$classifier-sources.$extension (IDE dependent see NetBeans DependencyNode). Ceylon source artifacts should be made available in a way that this resolution works out of the box. This means that the artifacts are made available under another name than they are normally accessible in Herd.

FroMage commented 8 years ago

I had the same thought the other day, I think that'd be really handy indeed. Would you be willing to test this for me?

FroMage commented 8 years ago

So Maven repo docs fail to mention these:

FroMage commented 8 years ago

The biggest problem would be that Ceylon modules don't have a group/artifact split. We can emulate one, by splitting on the last ., but it may be weird.

bjansen commented 8 years ago

That's already what happens in the pom.xml stored in every car file:

<?xml version="1.0" ?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
 <modelVersion>4.0.0</modelVersion>
 <groupId>ceylon.interop</groupId>
 <artifactId>java</artifactId>
 <version>1.2.2</version>
 <name>ceylon.interop.java</name>
 <dependencies>
  <dependency>
    <groupId>ceylon</groupId>
    <artifactId>collection</artifactId>
    <version>1.2.2</version>
  </dependency>
 </dependencies>
</project>
bjansen commented 8 years ago

Alternatively, we could add an annotation in module.ceylon, something like:

mvn("ceylon", "interop.java")
module ceylon.interop.java 1.0.0 {
    ...
}
gavinking commented 8 years ago

We can emulate one, by splitting on the last ., but it may be weird.

Why can't the group and artifact be identical: the module name?

luolong commented 8 years ago

That's a slippery slope right there -- adding Maven specific annotations to language module...

Maybe have a group annotation instead. This could be an informative tag for Herd, that could be used as a grouping of similar modules together.

And additionally, this would be used for mvn pom generation.

Lacking group annotation, group and artifact id could very well be module's full name

An example:

group("ceylon.sdk")
module ceylon.time 1.2.2 { ...}

Would generate following POM file:

<?xml version="1.0" ?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
 <modelVersion>4.0.0</modelVersion>
 <groupId>ceylon.sdk</groupId>
 <artifactId>ceylon.time</artifactId>
 <name>ceylon.time</name>
 <version>1.2.2</version>
 <dependencies>
  ...
 </dependencies>
</project>
ckulenkampff commented 8 years ago

Usually the groupid refers to the project. See http://central.sonatype.org/pages/choosing-your-coordinates.html

The groupId identifies your project uniquely across all projects and you control this section of the overall name-space.

I think a better default would be: groupId = Ceylon module name artifactId = last component of the Ceylon module name

Why can't the group and artifact be identical: the module name?

This might be the best default, because then the artifact would have already the right name for resolution.

ckulenkampff commented 8 years ago

Would you be willing to test this for me?

Yes. I will try to create some kind of integration test for this.

FroMage commented 8 years ago

@vietj: what do you think we should do about group/artifact?

FroMage commented 8 years ago

Note that if we change the group/artifact mapping in Herd, we will also want to change it in the generated pom.xml in the .car files…

bjansen commented 8 years ago

I think Herd should simply extract the pom.xml from the car file, and build the correct hierarchy of folders + generate checksums, that should be enough to expose a Maven repo, right?

FroMage commented 8 years ago

That is indeed another option. Except that:

davidfestal commented 8 years ago

So it should be a mix of both, as we do when generating OSGI metadata in manifest: reuse information specified in the internal POM, and add any additional information that can be provided by Herd.

bjansen commented 8 years ago

reuse information specified in the internal POM, and add any additional information that can be provided by Herd

Looks like a good compromise.

authors, urls, licence, description, etc…

Technically, poms can also contain such data, so we could fill it from module.ceylon if present (is Herd already doing this?).

FroMage commented 8 years ago

I can, that's why I said that it's a bit richer if I generate the pom.xml rather than use the one inside the .car.

FroMage commented 8 years ago

https://modules.ceylon-lang.org/maven/1/ceylon/language/1.2.1/language-1.2.1.pom https://modules.ceylon-lang.org/maven/1/ceylon/language/maven-metadata.xml

Can you guys try it out?

FroMage commented 8 years ago

Supports .jar, .jar.sha1, -sources.jar, .-sourcesjar.sha1, .pom, .pom.sha1 and maven-metadata.xml.

Does not support javadoc yet but I'm pretty sure the Java tools would not be able to make sense of ceylondoc anyway.

ckulenkampff commented 8 years ago

Wow so fast! I did a small test with Gradle. It works like a charm!

plugins { id 'groovy' }

repositories {
    maven {
        name = 'ceylon-herd'
        url = 'https://modules.ceylon-lang.org/maven/1/'
    }
}

dependencies {
    compile 'ceylon:language:1.2.1'
    compile 'ceylon.interop:java:1.2.1'
    compile 'ceylon:collection:1.2.1'
}

image The sources are parsed as Java files, but this should be an Eclipse problem. In Netbeans sources are not shown for Ceylon sources. The Java source of Array.class in ceylon.language is shown, so I think Netbeans looks only for java files as sources :/.

I will try to proxy the Ceylon Herd server with a local Sonatype Nexus server as a second test.

I will have more time tomorrow then I will give you more detailed feedback. Do you think it's useful to have some kind of Gradle/Maven integration tests that can be run against the Herd server?

image When I see those artifacts, I really think a fully qualified artifact id would be better.

FroMage commented 8 years ago

Yeah, I think so too, which is why I asked @vietj about his opinion.

vietj commented 8 years ago

having an annotation to specify a group id : can raise a problem if you want to deduce de GAV from the Ceylon module name because you don't have the information

vietj commented 8 years ago

having a scheme with multiple group ids can make a problem later if you want to put the same deps on maven central because usually you owns a single group id

vietj commented 8 years ago

at the same time, maven does not manage cyclic dependencies, so I don't see how you would publish ceylon cyclic dependencies in a maven repo.

FroMage commented 8 years ago

ATM @vietj has parts of the distrib published at http://mvnrepository.com/artifact/org.ceylon-lang under the org.ceylon-lang group.

Using ceylon.language:ceylon.language as coordinates would make publishing to Maven Central harder has every groupId has to be registered (same as on Herd BTW, and we don't see that as a big problem although we do have an issue open to claim domains or wildcards).

FroMage commented 8 years ago

Another option is to use a common groupId for every Ceylon module: ceylon:ceylon.language.

renatoathaydes commented 8 years ago

why not use the same coordinates that already are in maven central?

ckulenkampff commented 8 years ago

But what is about other projects? Let's assume somebody wants to publish to Maven Central and Herd. In this situation it would be very important to be able to control artifactid and groupid. Otherwise the dependencies might be downloaded from the Maven repository and the Herd repository and both would get in the classpath.

I begin to think that the best idea is to have an annotation that ensures compatibility. Ivy, Maven and many other tools use at least group id and artifact id for resolution. For Ceylon modules an annotation for group id might be enough, but when people want to upload JARs to Herd and Maven they may want to use their existing groupid and artifact-id-scheme.

So maybe an annotation that can override group id and artifact id for repositories that support both coordinates would be the most interoperable way. The developer has to decide if she wants to use "Herd-mode" (Module-Version) or "Maven-mode" (Group-Module-Version) for module resolution.

gavinking commented 8 years ago

why not use the same coordinates that already are in maven central?

I agree that org.ceylon-lang feels natural.

Here's another idea: how about org.ceylon-lang.modules? i.e. the address of modules in Herd?

renatoathaydes commented 8 years ago

If you don't change the group, you don't need to go through the sonatype process again to determine if you own the groupId or not.

ckulenkampff commented 8 years ago

Here's another idea: how about org.ceylon-lang.modules? i.e. the address of modules in Herd?

Imagine a big project that uses a Ceylon module which itself depends on a Java-Jar module uploaded to Herd (possible?). The Ceylon module is only published in Herd, so it refers to the Jar in the pom as org.ceylon-lang.modules:com.company.project.module. The big project uses the same library itself, but uses the normal Maven project coordinates. Now the same module is mentioned twice: com.company.project:module and org.ceylon-lang.modules:com.company.project.module. edit: Oh this could happen the other way around already, am I right?

With an repository interoperability annotation this problem would not occur: When Herd simulates a Maven repository it would always use the coordinates specified by the annotation (also for transitive dependencies). The CMR could also use the annotation for duplicate detection.

The only question that would remain is, how the groupid is, when the module does not use the interoperability annotation.

I think in the future in the light of Jigsaw Maven and Ivy artifact resolvers will support multi-version resolution of dependencies. So I think finding a good solution for mapping artifacts between Herd and Maven/Ivy is crucial.

It just occured to me that this can also become a very important security problem. Bad users could use the Herd repository to inject manipulated Jar dependencies in projects that use Maven and Herd at the same time. You would always have to check Maven Central for similar groupids when accepting a new module into Herd...

With this in mind using "org.ceylon-lang.modules" as group id for all modules provided by Herd is probably the most safe option. Duplicate dependencies would have to be solved per project, which is an acceptable mid-term solution.

A possible long term solution would be, that when you want to link an artifact between Herd and Maven the POM at the Maven repository must contain a key generated by Herd that proves ownership of the group id... tedious :(.

renatoathaydes commented 8 years ago

I think in the future in the light of Jigsaw Maven and Ivy artifact resolvers will support multi-version resolution of dependencies

We already went through this a few times, but here we go again: no, Jigsaw does not, and according to the developers working on it, has no goal of, supporting multiple versions of a module at runtime. Jigsaw does not even include any version resolution strategy at all.

Back to ceylon modules...

In my opinion, the solution is simple: add an optional groupId to Ceylon modules in general. If a module has one, it may be possible to publish it on Maven repos as well as Ceylon ones, otherwise it just can't be published in a Maven/Ivy repo.

To solve the problem of where Ceylon will look for when resolving a module, I would say, keep the current scheme mostly the same and:

Which repo a module comes from should not matter, but Ceylon should just prefer Ceylon-specific first. There's no need for the esoteric solutions described by @ckulenkampff

ckulenkampff commented 8 years ago

I think in the future in the light of Jigsaw Maven and Ivy artifact resolvers will support multi-version resolution of dependencies

We already went through this a few times, but here we go again: no, Jigsaw does not, and according to the developers working on it, has no goal of, supporting multiple versions of a module at runtime. Jigsaw does not even include any version resolution strategy at all.

Sorry, didn't knew that. That made me wonder and I just looked up the specs. Seems to be vendor specific (see bottom of http://openjdk.java.net/projects/jigsaw/spec/reqs/). But even if this is irrelevant, I still think a good mapping is very important.

Back to ceylon modules...

My use cases are Ceylon modules used in Java projects and Java projects that use Java libraries with transitive Ceylon dependencies.

no need for the esoteric solutions

This would be great!

I just think that underestimating the possible problem space is dangerous. I think it's just a matter of time when a sophisticated hack hits Maven Central or a similar repository hard (see http://branchandbound.net/blog/security/2012/03/crossbuild-injection-how-safe-is-your-build/).

A Maven interop specific group id should at least result in an administrative process where somebody has to verify the group id in addition to the Ceylon module name... And exactly this would be the purpose of an ownership key placed in the pom. It would put Herd in a bad light when the artifacts collide with somebody else's group ids on Maven Central or jCenter.

ckulenkampff commented 8 years ago

I think a good mapping is also important because of this situation:

Ceylon Project --> Artifact A on Maven Central --> Artifact B on Maven Central
               --> Artifact B on Herd

or at least this situation:

Ceylon Project --> Artifact A on Herd --> Artifact B on Herd
               --> Artifact B on Maven Central 
renatoathaydes commented 8 years ago

@ckulenkampff the above situations are not a problem at all... with any build system, in which repo the artifact can be found is completely irrelevant, if they have the same coordinates they are the same...

luolong commented 8 years ago

I guess we can't really get around the problem of specifying group id's for a module if and when we want to publish those modules as maven artifacts.

As creation of POM is already part of compiler, this decision has to be done on a (backend) compiler level. I suggest that since maven group id is essentially a java backend compiler concern, why not make it an official compiler configuration option.

Have the default behavior to be as it currently is - where module name is split at the last dot – everything on the left of the last dot will become Maven Group id and the last segment will become artifact name.

Ex: Module ceylon.interop.java/1.2.2 would become ceylon.interop:java:1.2.2:car.

Alternatively we could keep the full module name as maven artifact name and only use everything before the last dot as group id.

Like this: ceylon.interop:ceylon.interop.java:1.2.2:car

Additionally have a set of compiler flags for ceylon compile tool --maven-group=ceylon-sdk and --maven-name=interop.java (or something like this) for overriding the default.

These could also be read from a .ceylon configuration file of the project.

Then there would be no need for any backend specific group annotation in the language module.

ckulenkampff commented 8 years ago

This seems to me to be a good solution. The administrative issues will be there anyway. Will/Does the CMR also look into the pom to identify artifacts that are already included as Herd modules in the module repository?

gavinking commented 8 years ago

@luolong It seems that ceylon.interop:java would be more consistent with how stuff like Hibernate and Spring do it today.

But there's a problem with your suggestion: ceylon.interop.java gets the group id ceylon.interop; the rest of the SDK gets the group id ceylon. That doesn't look quite right.

ckulenkampff commented 8 years ago

You could specify maven name and maven group for the ceylon projects as well? Something like this might be sufficient: org.ceylon-lang:interop-java (see http://mvnrepository.com/search?q=ceylon)

quintesse commented 8 years ago

I could accept the --maven-group part of @luolong 's suggestion. That ceylon.interop.java gets treated differently from the rest of the SDK just goes to show that we really need a way to override the default behaviour. So the build scripts in the SDK would just compile with --maven-group=org.ceylon-lang.

But I'm not so sure about --maven-name though, the compiler normally deals with multiple modules at the same time. So we would need give an error when you use that option with multiple modules. But in general it's not an option that really "adjusts" well to normal projects. (What if you have it in your ceylon config file and you add a new module? Unless we start extending the config to be able to set options based on the module name)

So in the end I'm back to thinking that to me a group/artifact name is an integral part of a module, so personally I feel more for an artifact("org.ceylon-lang", "interop-java") annotation on module to be honest.

ckulenkampff commented 8 years ago

edit: accidently closed the issue sorry :( way too easy to do this 0_o

It basically boils down how much effort you want to put in Maven/Java-first interoperability. To me this is very important. An annotation would reflect this importance in the language.

Would a lack of the annotation lead to a lack of a pom file?

Will the pom file always be generated or can a pom file be provided when the annotation is not set?

Especially this situation is very important to me: Java Maven/Gradle project -> Java library in Maven repository -> Ceylon module in Maven repository. When Ceylon Herd is inteded to replace Maven and the Ceylon-SDK stuff etc. will not be published on e.g. Maven Central I could live with something like this: Java Maven/Gradle project with additional Maven-facade Herd repository specified -> Java library found in Maven repository -> Ceylon module found in Herd repository

gavinking commented 8 years ago

So yeah, I was actually thinking along the lines of "well, this is actually rightly a project-level setting, not a module-level setting". Unfortunately we don't really have a very well-defined notion of what a project is right now. We do have the notion, implicitly, in the CLI, and explicitly, in the IDEs, but we don't have a place to specify project-level metadata, except as compiler settings.

bjansen commented 8 years ago

we don't have a place to specify project-level metadata

What about .ceylon/config and .ceylon/ide-config? They apply to all the modules under that directory.

We can use them to store more than compiler settings.

quintesse commented 8 years ago

this is actually rightly a project-level setting

Partly yes. But that would only work if you can still figure out a way to handle the name part. Sure we could go with some useful default but there will always be people we really really really want their artifact name to be different from the default.

ckulenkampff commented 8 years ago

edit: Ah @quintesse was faster :)

As far as I understand one project can contain multiple modules, which get published separately. Gradle anticipates Jigsaw in a similar way (see https://docs.gradle.org/current/userguide/java_software.html#defining_api). Don't you want to specify group and artifact id per module? As @quintesse explained you may want to use org.ceylon-lang as project wide group id and use Ceylon module specific artifact ids.

FroMage commented 8 years ago

with any build system, in which repo the artifact can be found is completely irrelevant, if they have the same coordinates they are the same...

Well… That's not entirely true. It depends. What we're trying to do is to make it so that the Maven coordinates could be the same on Herd and Maven Central. The Ceylon module name will remain the same.

BUT there are two sides to resolving. Maven/Gradle users that resolve things will use Maven coordinates. Ceylon module descriptors on the other hand will mostly use Ceylon module names, and those are not resolved using Maven resolvers. Only module names containing : will be resolved using the Maven resolver.

So for example, in Ceylon, importing ceylon:collection and ceylon.collection will get you two different modules. The first will be resolved using the Maven resolver and its embedded pom.xml and the second will be resolved using the Ceylon resolver and its embedded module descriptor.

It's not clear how we can avoid situations where people use Maven to resolve a Ceylon module using its Maven coordinates/repo, which will drag its dependencies using the Maven coordinates too, and then when calling the compiler/runner, Ceylon will use the Ceylon module descriptor and download Ceylon modules and work with those.

It makes at least for a double download.

Now, running using the Main API is a bit different in that Ceylon will not download any modules, but it will initialise the metamodel using metadata found in the classpath. Pretty sure that even for Ceylon modules fetched by Maven it will not use the pom.xml first, but the Ceylon module descriptor.

I'm not saying there's a problem there. I'm saying it's a little more complex than may seem, and not obvious at all works out.

gavinking commented 8 years ago

Don't you want to specify group and artifact id per module?

I can't imagine why you would want to have different group ids for modules in the same project. That sounds wrong.

gavinking commented 8 years ago

And, to be clear, I really don't want to see maven metadata polluting the module descriptors for the Ceylon SDK, OK? I need you guys to focus your minds on that requirement before you start trying to add groupId and artifactId annotations all over the SDK.

quintesse commented 8 years ago

I can't imagine why you would want to have different group ids for modules in the same project. That sounds wrong.

Don't tell people they're wrong, it just gives them more incentive to go and actually want to do it ;)

But I can imagine very well that large projects will have modules divided up in groups. Now sure, you can then also put them in separate source folder and then run the compiler several times for each group with different arguments. But that all goes a bit against the nice thing we have now where often you don't even need a build script because typing ceylon compile in the root of a project will just correctly compile everything.

I really don't want to see maven metadata polluting the module descriptors for the Ceylon SDK

I don't see how we can prevent this information from leaking out. Sure we can choose our defaults in such a way that we don't need to add anything to our SDK, but other people will surely come up with examples where they do need more exact control over those names.

FroMage commented 8 years ago

I really don't want to see maven metadata polluting the module descriptors

That's OK, we'll only add Gradle metadata ;)

FroMage commented 8 years ago

Or Ivy metadata, that's cool, right?