Open kesavkolla opened 9 months ago
Hey, yes OPFS is absolutely on the roadmap. IndexedDB was the quickest route to this early Alpha, but it's not the target.
Yeah indexdb is ok but not good for performance. OPFS gives that performance boost. It also has sync access so it makes so much sense for Postgresql to use it. We don't need that async layer to deal with.
Hey, yep exactly. It's the key target for persistence.
There are a few technicalities, although OPFS offers sync file handles, opening a file is still async. There a a few tricks we can take to get around this, using asyncify (which we currently use but is a resource overhead), or keeping a pool of open files.
lots to explore.
Probably the work done by Roy Hashimoto over on https://github.com/rhashimoto/wa-sqlite/ could get someone who knows what they are doing most of the way there!
While it obviously has some drawbacks, given the now wide compatibility... I think spinning up a web worker and treating the worker process like a browser-internal db server is pretty elegant. This approach seems to work well with wa-sqlite
, and (with some admittedly hacky workarounds) means you can get multiple tabs talking to a single web worker. It also means you can easily use the sync API and spinning up and waiting for the pool of handles just gets done once (for most circumstances probably).
Unless mistaken, wasmfs
is maybe not the best target for opfs
sync, at least according to https://sqlite.org/wasm/doc/trunk/persistence.md#vfs-opfs-sahpool and the wa-sqlite
discussions. Given it looks like pglite is going to necessarily be single process in accessing the files anyway and there is already a worker option, maybe a better target (in addition to the existing works-everywhere idb option) would be something mirroring the sqlite
vfs-opfs-sahpool
or wa-sqlite
AccessHandlePool
?
Hey @AntonOfTheWoods, I expect we are going to explore both routes in time. WASMFS is probably the shorter route to a first version, but may have drawbacks with CORS header requirements. An access handle pool would work, but would probably be a new Emscripten file system, Postgres doesn't have quite the same VFS concept as SQLite so we can't do it quite the same way. Postgres also uses a lot more file than SQLite, so its not having a handful of access handles, it's potentially hundreds.
Would be great to just bring your own fs instead of using predefined ones, so we could bring our own.
abstract class FileSystem {
abstract readFile(path: string): Promise<string>;
abstract writeFile(path: string, contents: string): Promise<void>;
/// other fs methods
}
import { PGLite } from "@electric-sql/pglite";
import { MemoryFS, NodeFS, IndexedDbFS } from "@electric-sql/pglite/fs";
const pg = new PGLite(new MemoryFS());
In the case of bun
its own file system claims to be faster than node, would be great.
@Neo-Ciber94 absolutely, that's something we are considering. All the FS are built on top of the Emscripten file systems at the moment, and there aren't really any unofficial ones out there. We would need to find the right level of abstraction to enable custom VFSs but it's certainly something that make a lot of sense to do!
For browsers it would be better to add support for OPFS so that we are not bounded by the memory limits and also faster than indexdb. Infact there is no serialization of data to be done of we use the filesystem that is available. OPFS is supported in all major browsers. emscripten also supports wasmfs which can be used with OPFS