BurntSushi / rebar

A biased barometer for gauging the relative speed of some regex engines on a curated set of tasks.
The Unlicense
211 stars 15 forks source link

Test GraalVM / TRegex #16

Open tarsa opened 1 week ago

tarsa commented 1 week ago

some docs on how to run tregex are here: https://github.com/oracle/graal/blob/master/regex/README.md

probably graalvm enterprise (aka oracle graalvm) will run faster than graalvm ce (community edition).

BurntSushi commented 1 week ago

Either someone will need to contribute this, or there will need to be far more detailed instructions on how to setup a project and run it with GraalVM/TRegex. The README has scant mention of this. It says something about using it "Truffle's interop mechanisms." But that means nothing to me.

Also:

Unlike most regular expression engines which use backtracking, TRegex uses an automaton-based approach. The regex is parsed and then translated into a nondeterministic finite-state automaton (NFA). A powerset construction is then used to expand the NFA into a deterministic finite-state automaton (DFA).

Unless they're doing powerset construction lazily, their regex compile times will have worst case exponential time in the size of the regex pattern.

tarsa commented 1 week ago

Either someone will need to contribute this, or there will need to be far more detailed instructions on how to setup a project and run it with GraalVM/TRegex. The README has scant mention of this. It says something about using it "Truffle's interop mechanisms." But that means nothing to me.

i've tried to make some example, but as you've said, the documentation is totally insufficient for quick experiments, so i think i'll give up on using tregex directly.

...but there's another clue in that (overall very unsatisfactory) docs:

TRegex originated as part of the Graal JavaScript implementation, but is now standalone so implementers of other languages can use it.

maybe running tregex through graaljs would be a sensible option? graalvm offers (some degree of) node.js compatibility: https://github.com/oracle/graaljs?tab=readme-ov-file#nodejs-support . i'll try to experiment with that.

tarsa commented 1 week ago

ok, running graaljs in node.js compatibility mode looks simple. i hope it will be easy enough for you to include it in your benchmarks.

what you need is to download graalnodejs from https://github.com/oracle/graaljs/releases . there are many variants:

it seems that you don't need to download whole graalvm distribution or anything else than a single graalnodejs variant to be able to run node.js scripts (but i haven't tested that fully, just run the main.js from this repo with empty input json).

graalnodejs is meant to be drop-in compatible with node.js, so it offers binaries with names 'node', 'npm', etc and that would clash with other node.js versions if you put everything on $PATH.

my example invocations:

$ echo {} | time ~/devel/graalnodejs-jvm-24.0.1-linux-amd64/bin/node main.js
/dev/shm/rebar/main.js:287
    throw new Error(`invalid KLV item: could not find first ':'`);
          ^

Error: invalid KLV item: could not find first ':'
    at parseOneKLV (/dev/shm/rebar/main.js:287:11)
    at parseConfig (/dev/shm/rebar/main.js:248:17)
    at main (/dev/shm/rebar/main.js:17:18)
    at Object.<anonymous> (/dev/shm/rebar/main.js:359:1)
    at Object._compile (node:internal/modules/cjs/loader:1356:14)
    at Object.<anonymous> (node:internal/modules/cjs/loader:1414:10)
    at Object.load (node:internal/modules/cjs/loader:1197:32)
    at Function._load (node:internal/modules/cjs/loader:1013:12)
    at Function.executeUserEntryPoint (node:internal/modules/run_main:128:12)
    at node:internal/main/run_main_module:28:49

Node.js v18.19.1
Command exited with non-zero status 1
11.09user 0.32system 0:03.43elapsed 332%CPU (0avgtext+0avgdata 553780maxresident)k
0inputs+64outputs (0major+140687minor)pagefaults 0swaps
$ echo {} | time ~/devel/graalnodejs-nativeimage-24.0.1-linux-amd64/bin/node main.js
/dev/shm/rebar/main.js:287
    throw new Error(`invalid KLV item: could not find first ':'`);
          ^

Error: invalid KLV item: could not find first ':'
    at parseOneKLV (/dev/shm/rebar/main.js:287:11)
    at parseConfig (/dev/shm/rebar/main.js:248:17)
    at main (/dev/shm/rebar/main.js:17:18)
    at Object.<anonymous> (/dev/shm/rebar/main.js:359:1)
    at Object._compile (node:internal/modules/cjs/loader:1356:14)
    at Object.<anonymous> (node:internal/modules/cjs/loader:1414:10)
    at Object.load (node:internal/modules/cjs/loader:1197:32)
    at Function._load (node:internal/modules/cjs/loader:1013:12)
    at Function.executeUserEntryPoint (node:internal/modules/run_main:128:12)
    at node:internal/main/run_main_module:28:49

Node.js v18.19.1
Command exited with non-zero status 1
1.14user 0.06system 0:00.46elapsed 259%CPU (0avgtext+0avgdata 358724maxresident)k
0inputs+0outputs (0major+44710minor)pagefaults 0swaps
$ ~/devel/graalnodejs-jvm-24.0.1-linux-amd64/bin/node --version
v18.19.1
$ ~/devel/graalnodejs-nativeimage-24.0.1-linux-amd64/bin/node --version
v18.19.1
BurntSushi commented 1 week ago

Thanks. I appreciate the leg work. I am still pretty unlikely to work on this any time soon personally.

It would also be helpful to know who is using tregex. Like, is it being used anywhere in a consequential manner? Because it's a non-goal to include literally every regex engine in rebar.

tarsa commented 1 week ago

It would also be helpful to know who is using tregex. Like, is it being used anywhere in a consequential manner?

since tregex is a part of graaljs (and graalnodejs), i'll analyze situation with graaljs.

i don't know how to measure whether graalvm-based node.js fork is being used frequently, but the javascript engine is probably already used in commercial scenarios.

graalvm-based js engine is integrated with oracle database and with java applications.

info about graaljs integration with oracle database:

https://www.graalvm.org/js/mle-oracle-db/

This page describes how to run JavaScript in an Oracle Database using the Oracle Database Multilingual Engine (MLE). MLE is powered by GraalVM: it can run JavaScript code in Oracle Database 23ai (and later) on Linux x64.

https://labs.oracle.com/pls/apex/f?p=94065:12:32018698560791:15

info about graaljs integration with java applications:

look at 'usages' column on https://mvnrepository.com/artifact/org.graalvm.js/js . the numbers in 'usages' column are clickable and will show you what other maven artifacts depend on particular version of graaljs engine.