Avoid table scans on registry

bryanjos commented 4 months ago

Related to #1329, we noticed that looking in the registry would take a long time, making the unbounded concurrency issue worse as processes would wait for the lookups here to finish. This removes the table scan and replaces it with a MapSet instead.

benwilson512 commented 4 months ago

Hey @bryanjos can you post a benchee script or similar somewhere that compares this? The select should be using only the table key, so in theory the performance of both what .select and your iterative approach should be both nLog(m). That does assume :ets is smart enough though, maybe it isn't.

It'd be good to see some specific examples to demonstrate this.

bryanjos commented 4 months ago

@benwilson512 I'll try to put that together.

A teammate made this change so I might not have the full picture. I think the issue stems from select using the match spec it's using combined with the fact that the registry has duplicate keys.

There's a not at the bottom of the docs for select that I think summarizes the issue Note that for large registries with many partitions this will be costly as it builds the result by concatenating all the partitions.

bryanjos commented 4 months ago

@benwilson512 I put together a benchee script and recorded the output in this gist. The proposed change is definitely faster, but does use more memory

koudelka commented 4 months ago

in practice, i believe lookup should be O(log(n)), and select should be O(n), both multiplied by the number of registry partitions. it's basically a full table scan vs an index tree traversal lookup. the truly pathological worst case for a lookup on a duplicate_bag would be when all tuples in the table have the same key, which is when it'd become O(n).

it'd be a neat PR into the erlang codebase to (compile-time?) warn that the match spec [{:==, :"$1", id}] is bad mojo.

for what it's worth, this was a service killer for us, it brought down entire nodes, whatever is going on under the covers with select seems to entirely lock up the VM at a not-unreasonable level of concurrency, it wouldn't even service disterl for rpc health checks. i'm guessing there's something in ets that doesn't yield back to the scheduler (yay bifs). :(

benwilson512 commented 4 months ago

hey @koudelka sorry to hear that it caused such issues. I'm happy to move forward with the PR. There is still a linear part to this PR, because have to call lookup n times, once for each returned doc ID. However probably the reason this is still faster is that n is still wildly less than the total size of the registry, and when we do select we I guess have to scan the entire registry.

I have a few "code golf" items I'll remark on and then we can head towards a merge.

bryanjos commented 4 months ago

@benwilson512 Looks like the made a small distance in the right direction. I'll update the PR to use a list as the accumulator and pipe into Map.new

Operating System: macOS
CPU Information: Apple M1 Pro
Number of Available Cores: 10
Available memory: 16 GB
Elixir 1.17.0
Erlang 27.0
JIT enabled: true

Benchmark suite executing with the following configuration:
warmup: 2 s
time: 10 s
memory time: 2 s
reduction time: 0 ns
parallel: 1
inputs: none specified
Estimated total run time: 42 s

Benchmarking Using MapSet and lookup ...
Benchmarking Using list accumulator ...
Benchmarking Using select ...
Calculating statistics...
Formatting results...

Name                              ips        average  deviation         median         99th %
Using list accumulator           9.25      108.11 ms     ±6.78%      104.94 ms      149.45 ms
Using MapSet and lookup          9.05      110.49 ms     ±3.31%      109.03 ms      126.19 ms
Using select                   0.0125    79835.89 ms     ±0.00%    79835.89 ms    79835.89 ms

Comparison:
Using list accumulator           9.25
Using MapSet and lookup          9.05 - 1.02x slower +2.38 ms
Using select                   0.0125 - 738.49x slower +79727.79 ms

Memory usage statistics:

Name                            average  deviation         median         99th %
Using list accumulator        152.54 MB     ±0.01%      152.53 MB      152.60 MB
Using MapSet and lookup       158.87 MB     ±0.00%      158.87 MB      158.87 MB
Using select                   31.18 MB     ±0.00%       31.18 MB       31.18 MB

Comparison:
Using list accumulator        152.53 MB
Using MapSet and lookup       158.87 MB - 1.04x memory usage +6.33 MB
Using select                   31.18 MB - 0.20x memory usage -121.36031 MB

benwilson512 commented 4 months ago

I am very puzzled at the 5x difference in memory performance between what we have today and the proposed changes. Let me see if I can poke at this and figure out what the cause is.

bryanjos commented 3 months ago

I'm thinking it's probably the use of the data structures and size of the examples. There probably is something that could tweak that could be made to balance out space and time

absinthe-graphql / absinthe

Avoid table scans on registry #1330