builtin semver 2.0.0 support

ju6ge commented 3 months ago

This PR adds builtin support for parsing and ordering semver strings:

`parse_semver` function:

parses semver strings according to semver regex

Taking the example valid semver strings this would look something like this:

cat /tmp/test.json | jaq

[
  "0.0.4",
  "1.2.3",
  "10.20.30",
  "1.1.2-prerelease+meta",
  "1.1.2+meta",
  "1.1.2+meta-valid",
  "1.0.0-alpha",
  "1.0.0-beta",
  "1.0.0-alpha.beta",
  "1.0.0-alpha.beta.1",
  "1.0.0-alpha.1",
  "1.0.0-alpha0.valid",
  "1.0.0-alpha.0valid",
  "1.0.0-alpha-a.b-c-somethinglong+build.1-aef.1-its-okay",
  "1.0.0-rc.1+build.1",
  "2.0.0-rc.1+build.123",
  "1.2.3-beta",
  "10.2.3-DEV-SNAPSHOT",
  "1.2.3-SNAPSHOT-123",
  "1.0.0",
  "2.0.0",
  "1.1.7",
  "2.0.0+build.1848",
  "2.0.1-alpha.1227",
  "1.0.0-alpha+beta",
  "1.2.3----RC-SNAPSHOT.12.9.1--.12+788",
  "1.2.3----R-S.12.9.1--.12+meta",
  "1.2.3----RC-SNAPSHOT.12.9.1--.12",
  "1.0.0+0.build.1-rc.10000aaa-kk-0.1",
  "99999999999999999999999.999999999999999999.99999999999999999",
  "1.0.0-0A.is.legal",
  "3.0.0-rc.1",
  "3.0.0-rc.2",
  "3.0.0-rc.3",
  "3.0.0"
]

cat /tmp/test.json | jaq 'map(. | parse_semver)' | [.[14]]

[
  {
    "major": "1",
    "minor": "0",
    "patch": "0",
    "prerelease": "rc.1",
    "buildmetadata": "build.1"
  }
]

`semver_ord` function:

use parse_semver to create a nested array sortable by sort_by function, this is done by transforming the version into the following structure:

[ [.major, .minor, .patch], [ .prerelease_name, .prerelease_version_number] ]

if there is no prerelease then the instead of the second array there will be {}.
buildmetadata is ignored for ordering
to construct the second array another regex is used which can be found here. It is what i came up with, if you think it should be adjusted feel free to play with it ;)
the effect of this is that versions get orderd by their [major, minor, patch] first. prereleases are ordered lexicographically first and then according to their subversion

applying the ordering to the above list of semver strings would look like this:

cat /tmp/test.json | ./target/debug/jaq 'sort_by(. | semver_ord) | reverse'

[
  "99999999999999999999999.999999999999999999.99999999999999999",
  "10.20.30",
  "10.2.3-DEV-SNAPSHOT",
  "3.0.0",
  "3.0.0-rc.3",
  "3.0.0-rc.2",
  "3.0.0-rc.1",
  "2.0.1-alpha.1227",
  "2.0.0+build.1848",
  "2.0.0",
  "2.0.0-rc.1+build.123",
  "1.2.3",
  "1.2.3-beta",
  "1.2.3-SNAPSHOT-123",
  "1.2.3----RC-SNAPSHOT.12.9.1--.12",
  "1.2.3----RC-SNAPSHOT.12.9.1--.12+788",
  "1.2.3----R-S.12.9.1--.12+meta",
  "1.1.7",
  "1.1.2+meta-valid",
  "1.1.2+meta",
  "1.1.2-prerelease+meta",
  "1.0.0+0.build.1-rc.10000aaa-kk-0.1",
  "1.0.0",
  "1.0.0-rc.1+build.1",
  "1.0.0-beta",
  "1.0.0-alpha0.valid",
  "1.0.0-alpha.beta",
  "1.0.0-alpha.beta.1",
  "1.0.0-alpha.0valid",
  "1.0.0-alpha-a.b-c-somethinglong+build.1-aef.1-its-okay",
  "1.0.0-alpha+beta",
  "1.0.0-alpha",
  "1.0.0-alpha.1",
  "1.0.0-0A.is.legal",
  "0.0.4"
]

01mf02 commented 3 months ago

Hi @ju6ge, that is some quite cool work! However, I do not want to put new filters into std.jq that are not part of main jq. I have even refused the inclusion of some jq filters (SQL-style operators, see #33) in std.jq.

Still, I think that filters like yours should be somehow findable and includable easily. The better way to do this than to put it into std.jq, I think, would be to have some kind of jq module repository where people could submit their filters to, and which could be then included in jq programs.

I guess that it would be best to open an issue in the jq repository.

wader commented 3 months ago

If you put the module in a repo you could add a link to it at https://github.com/fiatjaf/awesome-jq?tab=readme-ov-file#libraries-and-tools-for-jq-itself. There are some jq module repositories but don't know how much they are used

ju6ge commented 3 months ago

Hi guys, I experimented doing the same thing with the original jq. But I have found that the regex implementation of the original is not working correctly. There are already open issues for this, and they have been open for a while. So that is why i turned to jaq

So doing this is just not possible with the original 'jq' until they fix their regex parsing.

I get your policy of not adding thing not present in the original, sadly this will mean that I have to maintain my own fork for my purposes, to have this working easily in CI environments. And that people who would like to use this, have to do extra work to benefit from the work I put into this.

wader commented 3 months ago

@ju6ge i see, could you point me to the regex issue?

ju6ge commented 3 months ago

@wader You can test it for yourself if you take the following definition and try to parse a semver string using jq:

def parse_semver: capture("^v?(?P<major>0|[1-9]\\d*)\\.(?P<minor>0|[1-9]\\d*)\\.(?P<patch>0|[1-9]\\d*)(?:-(?P<prerelease>(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*)(?:\\.(?:0|[1-9]\\d*|\\d*[a-zA-Z-][0-9a-zA-Z-]*))*))?(?:\\+(?P<buildmetadata>[0-9a-zA-Z-]+(?:\\.[0-9a-zA-Z-]+)*))?$");

I get the following jq error:

jq: error (at <stdin>:0): Regex failure: undefined group option

This lead me to the following Github Issue: https://github.com/jqlang/jq/issues/2663. Just searching for group option in the Issues. I have not dug further …

wader commented 3 months ago

Ok, the error seem to be about (?P<name>) named capture groups or? but i don't see how they differ from just (?<name>)? if i change parse_semver to use (?<name>) it seems to work better:

$ jq -n -L . 'include "semver"; "1.2.3-4+meta" | parse_semver'
{
  "major": "1",
  "minor": "2",
  "patch": "3",
  "prerelease": "4",
  "buildmetadata": "meta"
}

ju6ge commented 3 months ago

@wader thanks for the pointer … so with some adjustment I could make it work for jq \o/

wader commented 3 months ago

@ju6ge i think so unless there was some special reasons for using (?P<name>) but i can't find that there is any difference between them. found this is https://www.pcre.org/current/doc/html/pcre2pattern.html

In PCRE2, a capture group can be named in one of three ways: (?...) or (?'name'...) as in Perl, or (?P...) as in Python

Oniguruma the regex library used by jq seems to support python syntax (ONIG_SYNTAX_PYTHON) but i don't know if it's possible to mix hmm. I would say easier is to just change to (?<name>)

ju6ge commented 3 months ago

@wader I think I will try adding this to jq directly with the adjusted regex. And post a link to the PR here ;)

Regarding the PYTHON syntax, at least regex101 seems to think that the full regex is a valid python regex. So dunno …

wader commented 3 months ago

Yeap it's probably valid python regex just that jq doesn't configure Oniguruma to support it and i suspect it won't be changed any time soon.

As one of the current jq maintainers i would say that we're quite reluctant to add new standard library functions. But please add it to awsome-jq, the jq wiki or both

01mf02 / jaq