Closed staltz closed 7 months ago
New and removed dependencies detected. Learn more about Socket for GitHub ↗︎
🚮 Removed packages: npm/@babel/parser@7.22.16, npm/@eslint/eslintrc@2.1.4, npm/@eslint/js@8.57.0, npm/@humanwhocodes/config-array@0.11.14, npm/@humanwhocodes/object-schema@2.0.2, npm/@szmarczak/http-timer@4.0.6, npm/@types/json-schema@7.0.12, npm/@typescript-eslint/parser@6.21.0, npm/@typescript-eslint/types@6.21.0, npm/@typescript-eslint/typescript-estree@6.21.0, npm/ast-module-types@2.7.1, npm/bits-to-bytes@1.3.0, npm/bson@5.4.0, npm/cacheable-lookup@5.0.4, npm/chai@4.4.1, npm/cliui@6.0.0, npm/eslint@8.57.0, npm/follow-redirects@1.15.3, npm/get-func-name@2.0.2, npm/glob@7.2.3, npm/graphql@16.8.1, npm/ip@1.1.9
Context
We're trying to reduce memory allocations to avoid OOM crashes and such.
Problem
Serialization/deserialization of bloom filters were creating too many intermediate arrays and other data structures.
Then, I uncovered two other problems with bloom filters:
Solution
Serialization/deserialization is now as direct as possible, without creating too many intermediate structures.
POINodeBloomFilter:
bytes-base64
which is more efficient than our own implementation (which was creating intermediate arrays)POINodeCountingBloomFilter:
toString(radix)
natively supports, and for simplicity we want 1 string character per bloom filter counter), then uselz-string
(a compression library) to compress the resulting string. The compression is actually amazing. Before this PR, serialized strings were 479k characters long. Now they are just 1k characters long!For the problem of using BloomFilters interchangeably with CountingBloomFilters, I reviewed the code carefully and I think this is how they should go:
Tests
Passes all unit tests.
Benchmarks
Before this PR
491_294_720 bytes
975_933_440 bytes
:scream:After
507_576_320 bytes
510_214_144 bytes