apache / couchdb

Seamless multi-master syncing database with an intuitive HTTP/JSON API, designed for reliability
https://couchdb.apache.org/
Apache License 2.0
6.12k stars 1.02k forks source link

Use QuickJS as a Javascript Engine #4448

Open nickva opened 1 year ago

nickva commented 1 year ago

https://bellard.org/quickjs/

It might a be a nice fallback, at least, if SM is missing or cannot be built for a particular OS/arch combination. RHEL 9 had stopped including the SM package, so we'd be back into building and maintained our own separate RPM for it. https://github.com/apache/couchdb/issues/4154

Moreover, SM keeps deprecating versions fairly aggressively and new stable versions keep changing the C++ API, which makes it unfriendly for embedding. For instance: https://github.com/apache/couchdb/pull/4305

QuickJS has a few nice things going for it:

pgj commented 1 year ago

Great initiative @nickva! According to its web site, the last release of QuickJS was almost 2 years ago. Would not it cause problems or is this actually an advantage (because it suggests stability)?

nickva commented 1 year ago

@pgj a stable and simple API geared for embedding would be an advantage as opposed chasing a fast moving API made for Web browsers.

nickva commented 1 year ago

https://gist.github.com/nickva/08f6100af20d23cde6dbdd65911c02e0

A very quick and dirty benchmark of basic startup and init differences between QuickJS and SM 91. It just times initializing an empty runtime and context. then free-ing and exiting the process.

Some example runs from the gist (see that for actual patches) and more run times:

SM91

% ./configure --dev --spidermonkey-version 91 && make
% TIMEFMT=$'Total (sec): \t%*E\nMax RSS(kb): \t%M\n'
% time ./couchjs

Init (usec):    755.000000
Free (usec):    772.000000
Total (sec):    0.016
Max RSS(kb):    3336

QuickJS

% ./configure --dev --spidermonkey-version quickjs && make
% TIMEFMT=$'Total (sec): \t%*E\nMax RSS(kb): \t%M\n'
% time ./couchjs

Init (usec):    372.000000
Free (usec):    102.000000
Total (sec):    0.008
Max RSS(kb):    980

Time-wise it looks about 2x as fast to initialize and takes about 3x less memory so at least is shows some promise there.

nickva commented 11 months ago

Updated PR https://github.com/apache/couchdb/pull/4627 with a wip skeleton of a couch_scanner app.

couch_scanner is an applications which knows how to scan databases/design_docs in the background, at a low rate, with periodic checkpointing. The idea, as discussed during the dev meeting a month or two ago, is to use it compare quickjs engine output with sm output and report discrepancies/incompatibilities as an extra safety for users to verify they can switch to the new js engine. Other uses could be to accumulate some cluster-wide states: total index sizes, some ddoc features (any docs using lists or shows) etc.

nickva commented 10 months ago

To avoid managing a new data state directory (config option, checking for disk space, handling read-only or other errors), it would be a lot simpler to manage the checkpoints as simple _local docs in _dbs. There is precedent for using that exact mechanism for shard splitting job management.

The general idea is that there is a general mechanism to traverse databases and ddocs. Periodically it will update its checkpoint in _dbs/_local/scanner_checkpoint doc with the current db and ddoc (and some job id, start time, initial settings, and a few other bits if needed).

Then, as the traversal happens, a call is made to each of the configured callback modules(scanner_quickjs_compact_check, 'scanner_size_stats`, ...) with the db and ddoc in turn. After all the callbacks are called the context for each individual module will be updated. It can be a simple map (json object). Then each module can do its own processing: write to some database, write to disk, log a report, etc.

The configuration could look something like:

[scanner]
enable = true | false
schedule_period = once | every_week | every_day | ...

[scanner.$module] ...
enable = true | false
... $module specific settings ...

For instance:

[scanner.quickjs_compat_check]
enable = true | false
sample_docs = 100
check_reduce = true | false
log_report_level = warning
nickva commented 10 months ago

The callback API for each scanner module might look like:

{ok, Ctx} = start_scan(#{session_id => Uuid, start_timestamp => UnixTs})
{ok, Ctx1} = start_db(Ctx, DbName)
{ok, Ctx1} = ddoc(Ctx, DbName, DDoc = #doc{})
{ok, Ctx1} = shard(Ctx, Db)
{ok, Ctx1} = end_db(Ctx, DbName)
ok = end_scan(Ctx)

This flow would be initialized and kept for each module individually. The scanner server process would hold a context that looks like:


 #state{modstates = #{Module1 => Ctx1, Module2 => Ctx2} ....}

The scan would be run on all the nodes. During scanning only the dbs with the first shard copy on that node would be scanned. The API doesn't call per-document callback. The idea would be that each plugin them may choose to sample only some docs or process all the docs or simply return {ok, Ctx} and move on.

A few events might stop or pause scanning:

joaohf commented 4 months ago

Hi @nickva

I'm trying this PR as a hope to substitute mozjs in the context of meta-erlang Yocto layer.

While testing the qjs branch I got the following error:

| Compiling /build/tmp-qemux86-64-glibc-couchdb/work/core2-64-poky-linux/couchdb/3.3.3+git/git/src/couch/priv/couch_js/86/main.cpp
| /build/tmp-qemux86-64-glibc-couchdb/work/core2-64-poky-linux/couchdb/3.3.3+git/git/src/couch/priv/couch_js/86/main.cpp:24:10: fatal error: jsapi.h: No such file or directory
|    24 | #include <jsapi.h>
|       |          ^~~~~~~~~

In my case, I don't have mozjs dependencies. Is that suppose to work ? I mean, could I say with couchdb is configured with quickjs engine, then couchdb does not need mozjs at all ?

Thanks.

nickva commented 4 months ago

@joaohf Thanks for trying it out!

Currently the branch requires both engines to be built as it's expected users would want try quickjs but have the ability to switch back to spidermonkey. There is even a compatibility background scanner app to help evaluate view functions with both to compare and report discrepancies in the log.

If everything goes well the idea is to eventually have a quickjs only build mode, but it's not there yet. I'll have to see if it's easy to add that, maybe as a --disable-spidermonkey option to ./configure where it won't even try to build spidermonkey to start with and just default to quickjs.

pgj commented 4 months ago

This error message above is surprising to me. In 9536b979, I have added some checks for configure to verify if SpiderMonkey is present and usable at the assumed location (with the assumed version, which is 91 by default). This check should have complained about the absence of SM.

joaohf commented 4 months ago

Hi

@pgj that will work. But, as I configure couchdb with:

NOTE: Running ./configure --js-engine=quickjs --disable-docs     --rebar /build/tmp-qemux86-64-glibc-couchdb/work/core2-64-poky-linux/couchdb/3.3.3+git/recip
e-sysroot-native/usr/bin/rebar     --rebar3 /build/tmp-qemux86-64-glibc-couchdb/work/core2-64-poky-linux/couchdb/3.3.3+git/recipe-sysroot-native/usr/bin/reba
r3     --erlfmt /build/tmp-qemux86-64-glibc-couchdb/work/core2-64-poky-linux/couchdb/3.3.3+git/recipe-sysroot-native/usr/bin/erlfmt

The SM has been disabled when js-engine is quickjs https://github.com/apache/couchdb/blob/qjs/configure#L321

@nickva, so could I state that SM will be necessary even for quickjs as the user needs to make a configuration choice in order to enable/disable quickjs and will be users that would like to --disable-spidermonkey from their builds ?

My big issue with SM-91 is because it does not support python-3.12 as build environment. As I'm integrating couchdb with the latest Yocto environment (which only supports python 3.12), I have a hard decision to keep couchdb and try to patch SM-91 to use python 3.12.

nickva commented 4 months ago

@nickva, so could I state that SM will be necessary even for quickjs as the user needs to make a configuration choice in order to enable/disable quickjs and will be users that would like to --disable-spidermonkey from their builds ?

@joaohf I think your case is valid, eventually it should work with SM completely unavailable, so I'll try to do add a --disable-spidermonkey configuration flag which won't even look for and try to build spidermonkey. It's just that at first, I was focused on allowing both, so users could evaluate the new JS engine and detect any compatibility issues. I'll try to add that mode after finishing the scanner part.

nickva commented 4 months ago

@joaohf there is now an option to completely disable Spidermonkey. For that add the --disable-spidermonkey option to ./configure. Give that a try, I didn't test it extensively just locally on macos so far.

joaohf commented 4 months ago

Hi @nickva

I did a first round building your branch in yocto context. The build was ok after some changes (please see the WIP patch https://github.com/meta-erlang/meta-erlang/pull/293/files#diff-e118f4df7011f6445d77896486ad074c2b9c15bb0c84a31bfc33d2e82ff97125) around cross compilation environment. As my patch is WIP, I want to better isolate the changes before suggest any changes.

Now, I'll test the build results.

Thanks.

nickva commented 4 months ago

@joaohf very interesting, I don't know much about Yocto project, so may not be able to help much with the diff.

I did however apply this patch to make it easier for you a bit:

-    {"CFLAGS", "-flto -g -Wall -D_GNU_SOURCE -DCONFIG_LTO=y -O2 -Iquickjs"},
-    {"LDFLAGS", "-flto -lm quickjs/libquickjs.lto.a"},
+    {"CFLAGS", "$CFLAGS -flto -g -Wall -D_GNU_SOURCE -DCONFIG_LTO=y -O2 -Iquickjs"},
+    {"LDFLAGS", "$LDFLAGS -flto -lm quickjs/libquickjs.lto.a"},

It seems to work on macos and debian so far with it.

I can see AR?=, CC?= patch for quickjs/Makefile being useful in general too but I haven't applied the patch yet.

I tried this patch to build_js.escript but it didn't work as the path where build_js.escript runs in a level above qjsc so it doesn't find just qjsc using os:cmd it must be added to the PATH somehow in Yocto?

-    os:cmd("quickjs/qjsc -c -N bytecode -o c_src/" ++ Tmp ++ " priv/" ++ Js),
+    os:cmd("qjsc -c -N bytecode -o c_src/" ++ Tmp ++ " priv/" ++ Js),

While at it I also rebased the QuickJS on the latest master

joaohf commented 4 months ago

Hi,

The patch for src/couch_quickjs/rebar.config.script is really handy. And I think it makes sense.

Most of Yocto Project work is related to fix cross compile build scripts, like the quickjs/Makefile. Sometimes the upstream is not perfect and some patches are needed.

Just some yocto related details:

But, for the couchdb and yocto scenario I had to use a different strategy which was to compile a native quickjs binary using the following recipe https://github.com/meta-erlang/meta-erlang/blob/e1c631d2af795f989f5dca770d8703d923950d93/recipes-extended/quickjs/quickjs_git.bb. Because when I build couchdb, it is a cross compilation for example to ARM and without the quickjs native recipe, I could not run quickjs if it was build by couchdb.

That is why I had also to patch the build_js.escript in order to remove the hardcoded quickjs path. In yocto build the quickjs will be available in PATH for couchdb build. I think the build_js.escript is very yocto specific :)

I'll keep you informed as soon I finish my tests.

Thanks.

nickva commented 4 months ago

That is why I had also to patch the build_js.escript in order to remove the hardcoded quickjs path. In yocto build the quickjs will be available in PATH for couchdb build. I think the build_js.escript is very yocto specific :)

Ah! that makes perfect sense. Thank you for explaining and for sharing more info about the Yocto Project.

nickva commented 2 months ago

As discussed in one of the CouchDB dev meetings, the last two required features for https://github.com/apache/couchdb/pull/4627 support were: 1) Windows support 2) A way to scan and find compatibility issues vs current Spidermonkey

Now both of those features are implemented in PR.