GregoryConrad / mimir

⚡ Supercharged Flutter/Dart Database
https://pub.dev/packages/mimir
MIT License
117 stars 6 forks source link

Web support (theoretical) #10

Open GregoryConrad opened 1 year ago

GregoryConrad commented 1 year ago

~Web support for mimir is currently not possible due to the lack of a publicly-available libc implementation for WASM made with JS & browser APIs. This concept of a browser-based libc sounds weird, because it is, but it isn't actually too bad; keep reading to see why.~

~A libc implementation via web APIs is theoretically possible (and has been done in some capacity!), as seen with https://webvm.io. WebVM was created with CheerpX and is written about in more detail here.~

~Considering an entire VM runs natively in the browser already using an in-progress libc implementation, and that VM itself isn't terribly slow, I see promise for the possibility of web support for this library (and many others) in the not-so distant future. (Again, this all relies on an open-source libc implementation that can be used by Rust.)~

libcs for the web are already available to differing degrees of completeness. Now it is just a matter of being able to use them.

GregoryConrad commented 1 year ago

Some collected thoughts

1.

WASM/WASI with a web WASI polyfill seems like the best option here so far because it is closer to the web platform and will probably have more maintenance over time. And maybe, if other platforms support WASM/WASI properly in the future, I could just ship one .wasm file which would be great (assuming it is performant enough on those platforms).

2.

Making a new Rust target from the learning technologies libc

3.

wasm32-unknown-emscripten?

Conclusion

I am really hoping option 1 turns out, and it should in theory. It will, however, be a bust if one of the depended-on crates relies on something in std that is not implemented in Rust's WASI std wrapper.

GregoryConrad commented 1 year ago

Few immediate blockers with option 1:

  1. Rust crate page_size does not provide web support at the moment (particularly the granularity function). There are a few open PRs but none have been merged yet.
  2. pthread support is nonexistent in WASI libc right now. Need to see if there is a way to compile/run without actual threading at the moment.
   Compiling lmdb-rkv-sys v0.15.1 (https://github.com/meilisearch/lmdb-rs#5592bf5a)

The following warnings were emitted during compilation:

warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/5592bf5/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:177:10: fatal error: 'pthread.h' file not found
warning: #include <pthread.h>
warning:          ^~~~~~~~~~~
warning: 1 error generated.

error: failed to run custom build command for `lmdb-rkv-sys v0.15.1 (https://github.com/meilisearch/lmdb-rs#5592bf5a)`
GregoryConrad commented 1 year ago

Also looks like WASI might get threading soon-ish™️, and Dart will also get Native Assets some months down the line. Might just be worth waiting on these for web support, instead of trying to work around them.

dinbtechit commented 1 year ago

Interesting documentation. It was very helpful.

Dart will also get Native Assets some months down the line. Might just be worth waiting on these for web support, instead of trying to work around them.

Are you referring to this article Beyond Dart 3? Will this make porting native code into browsers easier?

 "Wasm enables Flutter Web apps to run as full native code in browsers. 
 This is a large undertaking, requires work, beyond updating the Dart compilers.
 It requires collaborating with the W3C and browser vendors..."

Just curious, have you considered going with Isar's approach https://github.com/isar/isar/tree/main/packages? I.e, using web browsers indexedDB for the time being?

GregoryConrad commented 1 year ago

Are you referring to this article Beyond Dart 3? Will this make porting native code into browsers easier?

No, I’m referring to https://github.com/dart-lang/sdk/issues/50565

If Flutter provides an easy way for plugins to use .wasm files with Native Assets, then I plan to go with that approach. The flutter approach would also need to support a browser WASI polyfill; otherwise, I would need to handle it all manually.

Just curious, have you considered going with Isar's approach https://github.com/isar/isar/tree/main/packages? I.e, using web browsers indexedDB for the time being?

mimir wraps around milli, which is the search engine library that powers meilisearch. milli uses LMDB as a key value store under the hood, and I could try to make a version of milli that uses a different key value store, but I’d rather not as it’d be pretty difficult to do correctly and would be time consuming.

Most FRB libraries can probably get away with simply compiling to WASI and using a WASI polyfill for web support (which would use something like IndexedDB under the hood in the polyfill), but I can’t do that directly in mimir (at the moment) because of its unique position (due to its dependency on LMDB).

GregoryConrad commented 1 year ago

Slight Update

LMDB

~I have gotten LMDB to compile with https://github.com/GregoryConrad/mimir/pull/97/commits/32661b835970874fb127a2d551498b3ef08aa81f~ by using experimental WASI threading support, some emulated libc functions, and forcing the use of POSIX semaphores (which I actually introduced originally for iOS/macOS). I have zero clue if the LMDB compiled to WASI will work or not, but at least it is compiling now. Baby steps. Once WASI gets actual threading support, I'll see if I can remove the THREAD_MODEL=posix change to use the built-in threading.

Edit: I didn't actually get LMDB to compile (thought I did based on Terminal output, but it was still in the process of compiling), but I got close. Need to tweak a few LMDB compilation flags.

Edit 2: LMDB is failing to compile due to missing POSIX file lock & signal support in WASI libc. Since file locking was recently added to WASI (https://github.com/WebAssembly/wasi-filesystem/issues/2), I'm guessing it'll be added to WASI libc sometime in the near future. However, the issue of some missing POSIX signal functions is still present.

LMDB Compilation Errors ``` The following warnings were emitted during compilation: warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:2972:11: error: use of undeclared identifier 'F_SETLK'; did you mean 'FD_SET'? warning: Pidset = F_SETLK, Pidcheck = F_GETLK warning: ^~~~~~~ warning: FD_SET warning: /Users/gconrad/Documents/mimir/platform-build/wasi-libc/sysroot/include/__fd_set.h:42:22: note: 'FD_SET' declared here warning: static __inline void FD_SET(int __fd, fd_set *__set) { warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:2972:11: error: integer constant expression must have integer type, not 'void (int, fd_set *)' warning: Pidset = F_SETLK, Pidcheck = F_GETLK warning: ^~~~~~~ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:2972:31: error: use of undeclared identifier 'F_GETLK' warning: Pidset = F_SETLK, Pidcheck = F_GETLK warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:3005:22: error: use of undeclared identifier 'F_WRLCK' warning: lock_info.l_type = F_WRLCK; warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:3010:14: error: use of undeclared identifier 'F_GETLK' warning: if (op == F_GETLK && lock_info.l_type != F_UNLCK) warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:3010:45: error: use of undeclared identifier 'F_UNLCK' warning: if (op == F_GETLK && lock_info.l_type != F_UNLCK) warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:5154:22: error: use of undeclared identifier 'F_RDLCK' warning: lock_info.l_type = F_RDLCK; warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:5158:35: error: use of undeclared identifier 'F_SETLK'; did you mean 'FD_SET'? warning: while ((rc = fcntl(env->me_lfd, F_SETLK, &lock_info)) && warning: ^~~~~~~ warning: FD_SET warning: /Users/gconrad/Documents/mimir/platform-build/wasi-libc/sysroot/include/__fd_set.h:42:22: note: 'FD_SET' declared here warning: static __inline void FD_SET(int __fd, fd_set *__set) { warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:5189:21: error: use of undeclared identifier 'F_WRLCK' warning: lock_info.l_type = F_WRLCK; warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:5193:34: error: use of undeclared identifier 'F_SETLK'; did you mean 'FD_SET'? warning: while ((rc = fcntl(env->me_lfd, F_SETLK, &lock_info)) && warning: ^~~~~~~ warning: FD_SET warning: /Users/gconrad/Documents/mimir/platform-build/wasi-libc/sysroot/include/__fd_set.h:42:22: note: 'FD_SET' declared here warning: static __inline void FD_SET(int __fd, fd_set *__set) { warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:5202:22: error: use of undeclared identifier 'F_RDLCK' warning: lock_info.l_type = F_RDLCK; warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:5203:35: error: use of undeclared identifier 'F_SETLKW' warning: while ((rc = fcntl(env->me_lfd, F_SETLKW, &lock_info)) && warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:10145:2: warning: call to undeclared function 'sigemptyset'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] warning: sigemptyset(&set); warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:10146:2: warning: call to undeclared function 'sigaddset'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] warning: sigaddset(&set, SIGPIPE); warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:10147:12: warning: call to undeclared function 'pthread_sigmask'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] warning: if ((rc = pthread_sigmask(SIG_BLOCK, &set, NULL)) != 0) warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:10147:28: error: use of undeclared identifier 'SIG_BLOCK' warning: if ((rc = pthread_sigmask(SIG_BLOCK, &set, NULL)) != 0) warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:10172:6: warning: call to undeclared function 'sigwait'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] warning: sigwait(&set, &tmp); warning: ^ warning: /Users/gconrad/.cargo/git/checkouts/lmdb-rs-97af4d460cf53f67/501aa34/lmdb-sys/lmdb/libraries/liblmdb/mdb.c:11159:33: warning: unused parameter 'env' [-Wunused-parameter] warning: mdb_env_get_maxkeysize(MDB_env *env) warning: ^ warning: 5 warnings and 13 errors generated. error: failed to run custom build command for `lmdb-rkv-sys v0.15.1 (https://github.com/meilisearch/lmdb-rs#501aa34a)` ```

page_size

Now the only blocking issue, as far as I can see, is page_size failing to compile to WASI. See https://github.com/Elzair/page_size_rs/pull/3

page_size is a dependency of heed, so maybe I could take a look over there to get around the page_size dependency since it looks very sparsely maintained.

GregoryConrad commented 1 year ago

Seeing https://github.com/WebAssembly/WASI/issues/166, I'm inclined to think that the best way forward would actually be trying an alternative backend in heed that is WASM/WASI friendly. Trying to see if mdbx works.

Update: compiling mdbx fails because compilation cannot find assert.h in wasi-libc. Not sure what the fix for this is, considering I can see an assert.h in sysroot/include/. There may be other issues than just this file not found, but this is definitely a blocking one.

GregoryConrad commented 1 year ago

For picking up where I left off: investigate mdbx-sys’ build.rs. I have a feeling it might be doing something wrong, causing assert.h to not be found.

GregoryConrad commented 1 year ago

mdbx is out of the picture because the team over at meili would rather go for a full Rust-only approach (rather than a different C library); specifically, either sanakirja or redb. Sanakirja is a bit more complex and has some limitations (like limited key/value size), and fails spectacularly when trying to compile to WASI due to some unmaintained dependencies. redb looks more promising (despite being in beta still), so I plan to spend some time to see how WASI compatible it is; see https://github.com/cberner/redb/issues/507

GregoryConrad commented 1 year ago

Based on the 3 options I originally gave above:

  1. This is still the best approach in my eyes, but the wasi-libc is highly lacking in functionality required by any DBMS. Thus, the solution here is to avoid any Rust or C database and just use IndexedDB through heed which is what I was trying to avoid, unfortunately. I read some about SQLite and WASM (that uses IndexedDB under the hood), but I really would just rather go straight to IndexedDB at this point. Aside: It really is a shame that WASI has been worked on for years and still feels only half-baked. They should've just started out with making it POSIX compliant and then nearly anything could've targeted it.
  2. This approach is too much work, and it would make more sense to just use emscripten. I also doubt it would be compatible with wasm-bindgen, which is used by FRB.
  3. emscripten Rust target not compatible with wasm-bindgen and seems poorly maintained.

Now it is just a matter of first making sure that milli's deps can actually run on WASI before going through the trouble of writing a shim based on LMDB's APIs that actually uses IndexedDB.

GregoryConrad commented 1 year ago

See https://github.com/meilisearch/heed/issues/162. For future self, can probably use https://github.com/devashishdxt/rexie as a reference implementation on top of web-sys.

GregoryConrad commented 1 year ago

Instead of indexedDB, will try out redb with WASI as that’d be preferred.

If wasmer js is problematic, can also try out https://github.com/bjorn3/browser_wasi_shim

GregoryConrad commented 1 year ago

This may be very helpful for future implementation: https://wasmer.io/posts/announcing-wasix

mdmm13 commented 2 months ago

Hey @GregoryConrad

Great work on mimir, looks very promising, esp. the relevance part. We've been on Isar since its start, so maybe a few insights might help here:

  1. Isar uses MDBX in the background, which is the reason its performance was historically so strong (unrelated to web, but might help with #231). There are well-maintained Rust bindings.
  2. Isar is super stable on web via WASM, uses sqlite in the background. Might be worth a look. The author of Isar shared his thoughts on the future of web here.
  3. Flutter WASM is now stable

Hope this helps.

GregoryConrad commented 2 months ago

Hi @mdmm13 👋

About your comments:

  1. Mimir internally wraps around Meilisearch, which itself uses LMDB (which I believe is the original project MDBX was forked out of). My desire for redb is that it should work on both web and non-web platforms, while also removing a whole class of issues related to memory-mapped I/O.
  2. The issue is that I rely on Meilisearch, and consequently am using whatever they use. To switch the entirety of Meilisearch to redb is not feasible, so instead I'd need to make a shim for redb in heed so that it could be used instead of LMDB but keep the Meilisearch codebase completely intact (however, they have some other dependencies that use mmap which would likely also need to get workarounds...)
  3. That never was too much of a blocker for the issue, but rather Native Assets was/is. It also doesn't compare to the amount of work required in order to switch everything to redb, which is a ton, and why I haven't done it yet (and likely won't anytime soon due to the time investment and no promise of success)