chunk size for storage engine

fireproof-storage / fireproof

Realtime database, runs anywhere. Install Fireproof in your front-end app or edge function, and sync data via any backend.

https://fireproof.storage

Other

219 stars 16 forks source link

chunk size for storage engine #73

Closed jchris closed 2 weeks ago

jchris commented 5 months ago

load testing partykit storage I've found a maximum size, which means we need to have the encrypted blockstore split blocks larger than a threshold in some configurations.

[pk:inf] PUT /parties/fireproof/test7 500 Internal Server Error (2ms)
[pk:inf] PUT /parties/fireproof/test7 201 Created (3ms)
[pk:inf] OPTIONS /parties/fireproof/test7 200 OK (3ms)
✘ [ERROR] onRequest error RangeError: Values cannot be larger than 131072 bytes. A value of size 142980 was provided.

      at Server2.onRequest
  (file:///Users/jchris/Documents/GitHub/fireproof/packages/connect-partykit/test/app/src/server.ts:57:36)
      at async PartyDurable.fetch
  (file:///Users/jchris/Documents/GitHub/fireproof/packages/connect-partykit/test/app/src/server.ts:981:18)

jchris commented 5 months ago

in this approach, result.removals can be used to mark slabs for compaction

eg old compaction slabs that correspond to not-recently-modified data will stick below the chunk size, and we only need to recompact them when one of their cids gets touched by a removal.

jchris commented 5 months ago

if we want to implement a max car size, we can do splitting here, and save more than one car per transaction.

https://github.com/fireproof-storage/fireproof/blob/04efb3355e6af03b29d3aac5a1d285c50f159769/packages/encrypted-blockstore/src/loader.ts#L224

jchris commented 5 months ago

The hard part is if your split compaction file count is bigger than your autoCompact threshold you need a different approach.

The too bad part is this alone doesn't make the compaction process streaming. In theory could write out files as the compactor goes and flush memory.

jchris commented 4 months ago

I think this can be implemented as a mimimal diff by changing the car log format from current:

[cid3, cid2, cid1, cid0]

where cid0 is a compactor output with the full data set.

If instead of one file, sometimes the writer outputs multiple files, we can capture that info like:

[[cid3], [cid2a, cid2b], [cid1], [cid0a, cid0b, cid0c]]

this will minimize semantic blast radius

jchris commented 3 months ago

WeChat Mini Programs have a 1 MB block size limit with a 10MB total storage limit. also relevant for this feature

jchris commented 3 months ago

from Grimoire:

When considering integration with WeChat Mini Programs, it's crucial to be aware of their storage constraints, specifically the 1MB limit for each key-value pair and a total storage limit of 10MB. Adjusting the chunk size for the encrypted blockstore to comply with these limits will be essential for seamless functionality within WeChat's environment. For detailed guidelines on WeChat's storage limitations, please refer to the official WeChat Mini Program Documentation.

jchris commented 2 weeks ago

closed by #119 thank @valorant-dhruv