Reimplement bloom filter serialization/deserialization

Railgun-Community / private-proof-of-innocence

Node for Private Proof Of Innocence list provider and list aggregator.

MIT License

5 stars 6 forks source link

Context

We're trying to reduce memory allocations to avoid OOM crashes and such.

Problem

Serialization/deserialization of bloom filters were creating too many intermediate arrays and other data structures.

Then, I uncovered two other problems with bloom filters:

Serialization/deserialization of CountingBloomFilters was wrong, it was ignoring the "count" number which is the whole point of the CountingBloomFilters.
Code elsewhere was using BloomFilters interchangeably with CountingBloomFilters and they shouldn't work like that.

Solution

Serialization/deserialization is now as direct as possible, without creating too many intermediate structures.

POINodeBloomFilter:

Uses the new library bytes-base64 which is more efficient than our own implementation (which was creating intermediate arrays)

POINodeCountingBloomFilter:

For each counter in the bloom filter, serialize to a single-character of base36 string (why base 36? Because that's the largest radix that JavaScript's toString(radix) natively supports, and for simplicity we want 1 string character per bloom filter counter), then use lz-string (a compression library) to compress the resulting string. The compression is actually amazing. Before this PR, serialized strings were 479k characters long. Now they are just 1k characters long!

For the problem of using BloomFilters interchangeably with CountingBloomFilters, I reviewed the code carefully and I think this is how they should go:

POINodeBloomFilter used ONLY in
- legacy-transact-proof-mempool-cache
- legacy-transact-proof-mempool
- blocked-shields-cache
- blocked-shields-syncer
POINodeCountingBloomFilter used ONLY in
- transact-proof-mempool-cache
- transact-proof-mempool

Tests

Passes all unit tests.

Benchmarks

Before this PR

RSS memory just before BloomFilter unit tests: 491_294_720 bytes
RSS memory just after BloomFilter unit tests: 975_933_440 bytes :scream:

After

RSS memory just before BloomFilter unit tests: 507_576_320 bytes
RSS memory just after BloomFilter unit tests: 510_214_144 bytes

Package	New capabilities	Transitives	Size	Publisher
npm/@types/json-schema@7.0.13	None	`0`	32.2 kB	types
npm/@types/mocha@10.0.6	None	`0`	95.6 kB	types
npm/@typescript-eslint/parser@6.6.0	Transitive: environment, eval, filesystem, shell, unsafe	`+55`	10.8 MB	jameshenry
npm/axios@1.5.0	network Transitive: filesystem	`+5`	1.87 MB	jasonsaayman
npm/byte-base64@1.1.0	None	`0`	42.4 kB	enepomnyaschih
npm/chai@4.3.8	None	`+4`	811 kB	keithamus
npm/eslint@8.49.0	environment, filesystem Transitive: eval, shell, unsafe	`+46`	9.36 MB	eslintbot
npm/follow-redirects@1.15.2	network	`0`	28.3 kB	rubenverborgh
npm/glob@7.2.0	filesystem Transitive: environment	`+3`	75.9 kB	isaacs
npm/is-string@1.0.7	None	`+2`	50.6 kB	ljharb
npm/is-symbol@1.0.4	None	`+1`	42.6 kB	ljharb
npm/is-typed-array@1.1.12	None	`0`	17.6 kB	ljharb
npm/is-typedarray@1.0.0	None	`0`	4.41 kB	hughsk
npm/isomorphic-ws@5.0.0	None	`0`	4.04 kB	heineiuo
npm/istanbul-lib-coverage@3.2.0	None	`0`	29.3 kB	oss-bot
npm/istanbul-lib-report@3.0.1	filesystem	`+1`	67 kB	oss-bot
npm/javascript-time-ago@2.5.9	None	`0`	4.44 MB	catamphetamine
npm/js-sha3@0.8.0	None	`0`	52.9 kB	emn178
npm/js-yaml@4.1.0	Transitive: environment, filesystem	`+1`	576 kB	vitaly

Package

New capabilities

Transitives

Size

Publisher

npm/@types/json-schema@7.0.13

None

0

32.2 kB

types

npm/@types/mocha@10.0.6

None