f4lco / libyear-gradle-plugin

A simple measure of dependency freshness.
https://github.com/f4lco/libyear-gradle-plugin
22 stars 2 forks source link

Ability to filter out pre-release dependency versions #11

Open grimsa opened 8 months ago

grimsa commented 8 months ago

I recently learned about the libyear metric and this plugin, and ran an analysis on one of our projects.

Problem

One issue I noticed in the output was that some dependencies are reported as outdated, even when no stable version existed.

Example line from the report:

 -> 1.7 years  from jakarta.persistence:jakarta.persistence-api (3.1.0 => 3.2.0-M1)

However, currently the released versions look like this:

VERSION NUMBER DATE PUBLISHED
3.2.0-M1 2023-11-23
3.2.0-B02 2023-11-06
3.2.0-B01 2023-08-28
3.1.0 2022-02-25
... ...

Given that using unstable/non-final dependency versions in production is considered to be bad practice, I think this plugin could either automatically exclude non-final versions, or at least allow the user to somehow configure which newer versions to consider.

Impact

For a project that had 79 outdated dependencies, 16 of them (i.e., ~20%) were compared against non-final versions:

 -> 1.7 years  from jakarta.persistence:jakarta.persistence-api (3.1.0 => 3.2.0-M1)
 -> 1.5 years  from jakarta.validation:jakarta.validation-api (3.0.2 => 3.1.0-M1)
 -> 1.4 years  from jakarta.annotation:jakarta.annotation-api (2.1.1 => 3.0.0-M1)
 -> 1.2 years  from net.sf.jopt-simple:jopt-simple (5.0.4 => 6.0-alpha-3)
 -> 10 months  from org.apache.logging.log4j:log4j-api (2.20.0 => 3.0.0-beta1)
 -> 10 months  from org.apache.logging.log4j:log4j-to-slf4j (2.20.0 => 3.0.0-beta1)
 -> 6.2 months from org.jetbrains.kotlin:kotlin-stdlib-common (1.8.22 => 2.0.0-Beta2)
 -> 6.2 months from org.jetbrains.kotlin:kotlin-reflect (1.8.22 => 2.0.0-Beta2)
 -> 6.2 months from org.jetbrains.kotlin:kotlin-stdlib-jdk8 (1.8.22 => 2.0.0-Beta2)
 -> 6.2 months from org.jetbrains.kotlin:kotlin-stdlib (1.8.22 => 2.0.0-Beta2)
 -> 6.2 months from org.jetbrains.kotlin:kotlin-stdlib-jdk7 (1.8.22 => 2.0.0-Beta2)
 -> 3.8 months from org.slf4j:jul-to-slf4j (2.0.9 => 2.1.0-alpha0)
 -> 3.8 months from org.slf4j:slf4j-api (2.0.9 => 2.1.0-alpha0)
 -> 28 days    from org.apache.httpcomponents.client5:httpclient5 (5.2.3 => 5.4-alpha1)
 -> 25.9 days  from org.apache.httpcomponents.core5:httpcore5-h2 (5.2.4 => 5.3-alpha1)
 -> 25.9 days  from org.apache.httpcomponents.core5:httpcore5 (5.2.4 => 5.3-alpha1)

This results in either:

Collectively this:

Potential solutions

General solution

Looking at semver, it seems that any pre-release version would contain a hyphen:

A pre-release version MAY be denoted by appending a hyphen and a series of dot separated identifiers immediately following the patch version. . . . Examples: 1.0.0-alpha, 1.0.0-alpha.1, 1.0.0-0.3.7, 1.0.0-x.7.z.92, 1.0.0-x-y-z.--.

And looking at the anecdotal evidence from this one project, it seems that:

Therefore, maybe the general rule could be "if current dependency version contains a hyphen, then consider all available dependency versions, while if it does not - only look at versions without hyphens)

User-configurable solution

Maybe there could be a configuration parameter that allows the user to specify what versions to include or exclude:

libyear {
  configurations = ['compileClasspath']
  ignoreNewerArtifactsWithVersionsMatching = "<regex that matches specific suffixes>"
     ^-- new parameter
  failOnError = true
  validator = allArtifactsCombinedMustNotBeOlderThan(days(5))
}

Example of such regex could be -(?!jre) that would ignore anything with a hyphen, except if it was -jre

f4lco commented 8 months ago

Thank you very much @grimsa for your detailed report and your interest in this plugin!

From a surface-level reading, I think the plugin could do better for the general case of semver. If semver describes what a "pre-release" version number looks like, a configuration option to filter out pre-release versions looks reasonable, and may even default to "true".

But at the same time relying more on semver for artifact ordering may be a significant departure from the existing approach, in which the repository tells us which release is the most "recent" (aka "last published"). In many cases this strategy has been very reliable, and works also with projects which do not version with semver, while at the same time has other drawbacks, such as this one:

https://github.com/f4lco/libyear-gradle-plugin/blob/7849052ddbd5f6562fdc08b289e12bddf0d55936/libyear-gradle-plugin/src/main/kotlin/com/libyear/sourcing/SolrSearchAdapter.kt#L107-L111

We'll have to give it more thought, for implementation, as well as on the question "what is the best possible 'default' behavior for the plugin". Any input is appreciated :)

grimsa commented 8 months ago

About multiple dependency versions being maintained in parallel - I noticed that as well with Spring projects.

I did not consider it to be a problem in my case, because, for example, Spring Security maintains 3 versions in parallel (https://spring.io/projects/spring-security/#support), at the time of writing this it is 6.2.x, 6.1.x, and 5.8.x. As far as I can tell, they publish releases for all 3 versions within minutes of each other (starting with the oldest and finishing with the latest).

So if we were running the latest 5.8.x release, we would observe:

But I think it is acceptable, because:

  1. While each release line is being maintained, as long as we're on the latest release of same major version (even if it is not the latest release line) - we're still using a maintained version, so maybe libyear showing close-to-zero is meaningful. Once maintenance of 5.8.x line stops, we'd naturally see increasing number of libyears accumulating, and then we'd have a clear signal to upgrade.
  2. The fact that other release lines exist would still be visible in the report as a minutes-large amount of libyears for this dependency (because if 5.8.x line was the latest one, it's release would be published last, and then it would result in 0 libyears, and no entry). So this is also good, though it depends on Spring policy of publishing newer releases later (even if by minutes), which seems to not be the case with Tomcat.

--

As for how to determine the version.

I did try sending a request to Solr search (GET https://search.maven.org/solrsearch/select?q=g:"org.apache.tomcat" AND a:"tomcat") and see how given a version it can return a timestamp.

As for determining what versions are published - maybe it would be possible to leverage published maven metadata? For example, for Tomcat: https://repo1.maven.org/maven2/org/apache/tomcat/tomcat/maven-metadata.xml

It looks like this:

<?xml version="1.0" encoding="UTF-8"?>
<metadata>
  <groupId>org.apache.tomcat</groupId>
  <artifactId>tomcat</artifactId>
  <versioning>
    <latest>11.0.0-M15</latest>
    <release>11.0.0-M15</release>
    <versions>
      <version>7.0.35</version>
      // ...
      <version>7.0.109</version>
      <version>8.0.0-RC1</version>
      <version>8.0.0-RC3</version>
      <version>8.0.0-RC5</version>
      <version>8.0.0-RC10</version>
      <version>8.0.1</version>
      // ...
      <version>9.0.84</version>
      <version>10.0.0-M1</version>
      <version>10.0.0-M3</version>
      <version>10.0.0-M4</version>
      <version>10.0.0-M5</version>
      <version>10.0.0-M6</version>
      <version>10.0.0-M7</version>
      <version>10.0.0-M8</version>
      <version>10.0.0-M9</version>
      <version>10.0.0-M10</version>
      <version>10.0.0</version>
      // ...
      <version>10.1.17</version>
      <version>11.0.0-M1</version>
      <version>11.0.0-M3</version>
      <version>11.0.0-M4</version>
      <version>11.0.0-M5</version>
      <version>11.0.0-M6</version>
      <version>11.0.0-M7</version>
      <version>11.0.0-M9</version>
      <version>11.0.0-M10</version>
      <version>11.0.0-M11</version>
      <version>11.0.0-M12</version>
      <version>11.0.0-M13</version>
      <version>11.0.0-M14</version>
      <version>11.0.0-M15</version>
    </versions>
    <lastUpdated>20231212142015</lastUpdated>
  </versioning>
</metadata>

I also checked the metadata file for one of Spring Security artifacts and I see that releases are ordered by version (and not by release date).

And this is what metadata for Guava looks with its -android and -jre variants.

Maybe then the logic could be something like (pseudocode):

getMavenMetadata("org.apache.tomcat:tomcat").streamVersions()
   .dropWhile(version -> version is not equal to that of the dependency version in current project, e.g. "10.1.3")
   // v-- This filter step would deal with the logic requested in this issue
   .filter(version -> version is not a pre-release version as defined by semver or some other possibly customizable logic)
   .findLast() 

This would then result in 10.1.17 being returned, because all 11.0.0 versions are pre-release versions. And then Solr search could be used to lookup the release dates of 10.1.3 and 10.1.17 releases (to calculate libyear value).

So overall, it seems that combining use of Maven metadata with Solr search might make it possible to have a better solution for cases where multiple release lines are maintained in parallel (like Tomcat or Spring does), and it would also make it possible to exclude pre-release versions (because in Maven metadata we have access to all versions, not just the latest).

--

Now that I'm writing this, it also seems to me that maven metadata would also make it quite easy to implement version-distance-based metric calculation (I think the original paper argued that it has more benefits over date-based metric). That could be interesting and useful too.