Open mervyn-mccreight opened 2 years ago
The same goes for https://mvnrepository.com/artifact/org.amshove.kluent/kluent/1.68
org.amshove.kluent:kluent-common:1.68
- ERROR: Unknown license URL 'https://github.com/MarkusAmshove/Kluent/blob/master/LICENSE' is NOT allowed
Name-based lookup is only applied if the URL is not present. In both of these cases the URL supersedes that and is (obviously) not a canonical URL for the MIT license.
When we built the library we decided to favor URL since the overwhelming majority linked to an external canonical license copy. I would attempt to reach out to those libraries about switching to the SPDX link or the opensource.org link for the license. In my experience library authors usually understand the argument for pointing at these rather than their own copy.
Ah, I see.
IMO you should consider taking a look at the name even if it can't map the URL to a license, because referring to the maven documentation about the licenses part of the pom, nothing is really specified for the url
-part, but for the name
part it is at least recommended to use a SPDX-identifier.
What do you think?
That recommendation is very new and when sampling a few thousand artifacts almost none matched. I suspect the fact that MIT matches is more luck than intent. The overwhelming majority were "Apache 2.0" instead of the intended "Apache-2.0", for example.
I understand.
I did not want to criticize the way you came to the conclusion that looking for the license-url with priority makes sense. I don't mean to replace that - what I think of is more like:
Its more like an additional fallback to try to increase the detection rate.
I wouldn't say I'm 100% opposed to it, but I did give this a fair bit of thought when figuring out how the project was going to initially work.
My concern is always that the URL and the name may disagree on the license, and that the Maven POM specification offers no guidance on precedence. I wanted to ensure we never surprised someone so in most cases the code prefers to fail and have a human decide. This is why you can allow a specific URL if you trust it to be stable. Or you can allow a specific coordinate if it uses a custom URL that you do not trust to be stable.
Another thing that was considered early on is attempting to download unknown URLs and match their contents. It was ruled out because the number of custom URLs was very low and things like GitHub/GitLab links involve a ton of surrounding HTML.
There is a problem with the url though: the official urls are often just a template. For a valid license, you need to adopt the owner and year. At least this was the feedback I got: https://github.com/pgjdbc/pgjdbc/pull/2514#issuecomment-1135073402
There is a problem with the url though: the official urls are often just a template. For a valid license, you need to adopt the owner and year.
I would be interested in a version of this plugin that does match by identifier instead for this reason. If this requires more manual overrides, so be it. Without promising anything yet, would you accept a PR that makes the matching strategy configurable or do you consider this to be out-of-scope for this plugin, @JakeWharton?
I think it still has a lot of behavioral questions to work out. It's not like we can just flip a conditional here. Do we fall back to the URL if the name does not match? What happens if the name and URL match different licenses? I simply have never considered matching on the name and the entire design of the existing logic is around the URL, so I'm not keen to rush into something here.
First of all, sorry for going quiet for a while, but work and vacation kept me busy 😅. I fully agree that this is not something that should be rushed, but I did actually have an idea in mind already:
I propose to expose an API that the build author may use to tune the strategy. Specifically, we can mimic the API for configuring metadata sources for repositories. This could look as follows (exact names and syntax are tbd):
licensee {
licenseSources { // 1
licenseName() // 2
licenseUrl() // 3
}
}
We introduce a LicenseSources
interface that may be used to configure the sources from which to infer license information. In the DSL, this is exposed under the licensee
extension (1).
The interface defines several methods that may be used to enable certain sources of information. Initially, those would use be license name (2) and license URL (3), but additional sources could be added in the future without breaking configuration or introducing needless complexity.
licenseSources
would be a regular function under the licensee extension and probably have the signature fun licenseSources(action: Action<LicenseSources>)
. If the function is not called, we resort to the current behavior as default.
As an added benefit, we could allow the build author to call licenseSources {}
(without adding any sources) to indicate that all license mappings must be specified explicitly. However, I suggest deferring this decision and treating it as a configuration mistake for now.
Somewhat meta question but is anyone here aware of any tooling that uses pom.license.url
?
pom.license.url
so omitting it is not an issue for publishing. pom.license.name
and a missing pom.license.url
is fine.Is there tooling out there that requires the license url? Do we break something if we remove it from libraries poms?
Here is the output for using e.g. version 3.1.0, I assume it will be the same for 3.2.0 as well:
According to https://mvnrepository.com/artifact/io.github.g0dkar/qrcode-kotlin-jvm/3.1.0 the license name is published as
MIT
, so I assume it should get identfied asMIT
when I'm looking at the fallback logic inlicenses.kt
.I used the version
1.6.0
of this plugin to get the described results.