electric-sql / pglite

Lightweight WASM Postgres with real-time, reactive bindings.
https://pglite.dev
Apache License 2.0
9.45k stars 204 forks source link

About memory usage #406

Open Bloomingg opened 4 weeks ago

Bloomingg commented 4 weeks ago

I am currently facing a scenario where I perform extensive database operations in the browser and store a large amount of data. Previously, I used @jlongster/sql.js for this purpose. However, I’ve noticed that memory usage increases with the database size. Despite using IndexedDB for storage, it seems that a copy of the data is still kept in memory. I am urgently seeking an alternative library to address this memory issue. Could you tell me if pglite might have the same problem? I would appreciate your insights.

pmp-p commented 4 weeks ago

What is really the problem ?

Database access is all about creating fast access indexes to tabular data. The more indexes the more memory used => the faster they get. Indexes sizes and location will depend highly on your tables constructs and db engine configuration ( where we aimed defaults to low mem usage )

thruflo commented 4 weeks ago

I guess one aspect here is whether PGlite loads the whole database (indexes and all) into memory. Which I believe it does right now? As opposed to somehow keeping data on disk until needed to serve a query.

Bloomingg commented 4 weeks ago

I guess one aspect here is whether PGlite loads the whole database (indexes and all) into memory. Which I believe it does right now? As opposed to somehow keeping data on disk until needed to serve a query.

Yes, actually I want to know if PGlite loads the entire database into memory, as this may slow down the entire application when the data volume is large.

Bloomingg commented 4 weeks ago

What is really the problem ?

Database access is all about creating fast access indexes to tabular data. The more indexes the more memory used => the faster they get. Indexes sizes and location will depend highly on your tables constructs and db engine configuration ( where we aimed defaults to low mem usage )

I apologize if I didn't articulate my question clearly. In short, I want to know whether pglite will occupy a large amount of memory when I operate data in a database with a large amount of data (such as 200MB+).

pmp-p commented 3 weeks ago

Sorry i was not clear either, in the end it is all depending on your web host configuration and tables indexes - not the data size - here's why.

Case 1) if you don't have the "ideal" configuration everything goes into memory ( and will continue to, up to 2 GiB per tab. Mobile browsers may have more restrictions on that ) and this is what you are experiencing right now on any db : We are all artificially limited by browsers vendors this is not pglite specific. Overcoming these limitations has a severe cost on performance ( like 100x or more slowdown ).

Case 2)

In the "ideal" configuration "Shared Array buffers + atomics + a block device" (WIP) : it will not be the data taking memory, it will be only but the indexes regarding postgres configuration rules : they have some minimum values for a number of parameters and will add up depending on your tables contruction.

So it depends : pg docs says at least 8-96MB to be realistic :

"Although the minimum required memory for running Postgres is as little as 8MB, there are noticable improvements in runtimes for the regression tests when expanding memory up to 96MB [...]. The rule is you can never have too much memory."

Apart from the fact block device is (very) WIP, you will still have to configure to sab/atomics and to be practical of use i'd say 256-512MiB usage range is expected at first.

Tbh, getting in the 128-256MiB zone will take a lot of work/test/time because pg doc talks about used memory, but in the browser you have pg code[^1], error messages and C library/Javascript to run all that that also take up memory.

[^1]: some c++ or rust postgres plugins make take up to 128MiB on their own.