Callidon / bloom-filters

JS implementation of probabilistic data structures: Bloom Filter (and its derived), HyperLogLog, Count-Min Sketch, Top-K and MinHash
https://callidon.github.io/bloom-filters/
MIT License
359 stars 41 forks source link

Browser friendliness? #70

Open achingbrain opened 4 months ago

achingbrain commented 4 months ago

I'd really like to use this module in browsers but the bundle size ends up being very large, just importing the CuckooFilter adds over 50KB of minified code to the bundle.

Using esbuild to bundle just this script:

import { CuckooFilter } from 'bloom-filters'

const filter = new CuckooFilter(1, 2, 3, 4)
console.info(filter.has('hello world'))

Creates a 216KB bundle (63KB minified), and I have to shim some node internals (e.g. the Buffer class) for it to work:

image

The CuckooFilter implementation itself is 8.4KB un-minified so there's a lot of unused code here.

Would you be open to some PRs that make this module more browser friendly?

Some low hanging fruit I can see immediately:

  1. Switch tsc to output ESM for better tree shaking
  2. Replace use of node Buffers with built-in Uint8Arrays
  3. Remove, replace or optimise use of lodash
  4. Remove long dependency and use built-in BigInts

The first is a breaking change but will yield the biggest benefit - the breaking change is that consumers will no longer be able to require the module, they must import it via static or dynamic imports.

The second is breaking where Buffers are used as return values which from what I can see only appears to be in the Invertible Bloom Lookup Table implementation.

The rest are just internal changes so are non-breaking.

achingbrain commented 4 months ago

One spanner in the works here is that this module uses an older, incompatible version of JavaScript decorators so updating TypeScript is quite painful.

The decorators just add saveAsJSON/fromJSON methods to the filters. Is it necessary to use decorators for this? Could they just be regular methods instead?

folkvir commented 4 months ago

Hello 🖖 I'm working on a 4.0.0-alpha.0 available here https://github.com/Callidon/bloom-filters/tree/next/4.0.0. What's new in this version?

I would also like to remove the long and buffer dependencies, so if want to help you are welcome to create a PR from this new branch.

folkvir commented 4 months ago

Let me get a stable state before going on! I need to fix the tests. With the new xxhash package I introduced good bugs dealing with bigints. This will prepare the work for the long package.

folkvir commented 4 months ago

Update: buffer and long are not used anymore. The draft https://github.com/Callidon/bloom-filters/pull/71 is in progress but is working as expected. Take a look, I will be happy to get feedback, especially on the usage of the wasm which takes around 347kb 😬