Performance degradation over time with ETS backend

ruslandoga commented 10 months ago

Describe the bug I've been benchmarking hot paths in https://github.com/plausible/analytics and noticed that ExHammer check here becomes slower over time, from 11us in the beginning of k6 benchmark (it hits the same site every time, so :hammer_ets_buckets has a single key throughout the benchmark) to over 100ms in the end. And in some requests (when user-agents are cached) that check_rate call becomes the largest slice in the request flamechart

** Provide the following details

Elixir version (elixir -v): 1.15.7
Erlang version (erl -v): 26.1.2
Operating system: mac sonoma

ruslandoga commented 10 months ago

I've also tried to profile ETS backend and noticed that the workers Hammer.Backend.ETS don't actually perform any work, so the poolboy seems like pure overhead. When profiling poolboy process during k6 run, I'm getting the following from eprof

****** Process <0.462.0>    -- 100.00 % of profiled time ***
FUNCTION                             CALLS        %    TIME  [uS / CALLS]
--------                             -----  -------    ----  [----------]
gen_server:try_handle_info/3          1596     0.12     465  [      0.29]
queue:filter/2                        1590     0.16     657  [      0.41]
queue:filter_f/2                      3322     0.17     664  [      0.20]
lists:split/2                         3187     0.17     692  [      0.22]
poolboy:'-handle_info/2-fun-0-'/2     6425     0.20     800  [      0.12]
poolboy:handle_info/2                 1596     0.23     934  [      0.59]
lists:reverse/2                       6374     0.27    1065  [      0.17]
lists:split/3                        11967     0.32    1291  [      0.11]
queue:filter_r/2                      6283     0.45    1806  [      0.29]
queue:in/2                           16622     0.58    2323  [      0.14]
gen_server:reply/2                   44368     1.17    4687  [      0.11]
gen_server:try_dispatch/3            45958     1.21    4841  [      0.11]
queue:out/1                          44368     1.46    5834  [      0.13]
gen_server:handle_common_reply/8     62580     1.60    6420  [      0.10]
gen_server:try_handle_call/4         44372     1.86    7459  [      0.17]
ets:match/2                           1596     1.99    7965  [      4.99]
gen_server:try_handle_cast/3         44362     2.15    8615  [      0.19]
poolboy:handle_checkin/2             44368     2.82   11300  [      0.25]
ets:lookup/2                         44362     3.21   12871  [      0.29]
poolboy:handle_call/3                44372     3.26   13034  [      0.29]
ets:delete/2                         44368     3.50   14031  [      0.32]
gen_server:decode_msg/9              90330     3.60   14419  [      0.16]
gen_server:handle_msg/6              90330     4.27   17103  [      0.19]
poolboy:handle_cast/2                44362     4.70   18809  [      0.42]
ets:insert/2                         44368     5.95   23828  [      0.54]
erlang:monitor/2                     44372     7.51   30054  [      0.68]
erlang:demonitor/1                   44362    14.33   57358  [      1.29]
gen:reply/2                          44368    16.02   64132  [      1.45]
gen_server:loop/7                    90330    16.72   66924  [      0.74]
---------------------------------  -------  -------  ------  [----------]
Total:                             1016858  100.00%  400381  [      0.39]

Note that time spent in :ets calls is a small fraction of total time spent, which I assume is spent in :poolboy / :gen internals.

ruslandoga commented 10 months ago

Would there be any interest in optimizing the ETS backend? I'd be happy to work on that. The first step would be (I think) to add benchmarks and then compare the performance with and without poolboy to check if my guesses are right.

ruslandoga commented 10 months ago

The increasing wait time might point to a growing checkout queue in poolboy.

```elixir iex(1)> Supervisor.which_children Hammer.Supervisor [{:hammer_backend_single_pool, #PID<0.462.0>, :worker, [:poolboy]}] # :queue is empty ( {[], []}) iex(2)> :sys.get_state(pid(0,462,0)) {:state, #PID<0.463.0>, [#PID<0.469.0>, #PID<0.468.0>, #PID<0.467.0>, #PID<0.464.0>], {[], []}, #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo} # start benchmark iex(4)> :sys.get_state(pid(0,462,0)) {:state, #PID<0.463.0>, [#PID<0.469.0>, #PID<0.468.0>, #PID<0.467.0>, #PID<0.464.0>], {[], []}, #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo} iex(5)> :sys.get_state(pid(0,462,0)) {:state, #PID<0.463.0>, [#PID<0.464.0>, #PID<0.467.0>], {[], []}, #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo} iex(6)> :sys.get_state(pid(0,462,0)) {:state, #PID<0.463.0>, [#PID<0.469.0>, #PID<0.464.0>], {[], []}, #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo} iex(7)> :sys.get_state(pid(0,462,0)) {:state, #PID<0.463.0>, [], {[ {{#PID<0.20229.0>, [:alias | #Reference<0.0.2589315.192931417.1724973064.35897>]}, #Reference<0.192931417.1724907528.35896>, #Reference<0.192931417.1724907522.62923>}, {{#PID<0.20236.0>, [:alias | #Reference<0.0.2590211.192931417.1724973060.51696>]}, #Reference<0.192931417.1724907524.51695>, #Reference<0.192931417.1724907522.62922>} ], [ {{#PID<0.20235.0>, [:alias | #Reference<0.0.2590083.192931417.1724973063.38309>]}, #Reference<0.192931417.1724907527.38308>, #Reference<0.192931417.1724907522.62906>}, {{#PID<0.20238.0>, [:alias | #Reference<0.0.2590467.192931417.1724973064.35881>]}, #Reference<0.192931417.1724907528.35880>, #Reference<0.192931417.1724907522.62907>}, {{#PID<0.20237.0>, [:alias | #Reference<0.0.2590339.192931417.1724973061.47663>]}, #Reference<0.192931417.1724907525.47662>, #Reference<0.192931417.1724907522.62921>} ]}, #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo} iex(8)> :sys.get_state(pid(0,462,0)) {:state, #PID<0.463.0>, [], {[ {{#PID<0.30693.0>, [:alias | #Reference<0.0.3928707.192931417.1724973061.85287>]}, #Reference<0.192931417.1724907525.85286>, #Reference<0.192931417.1724907527.76715>}, {{#PID<0.30691.0>, [:alias | #Reference<0.0.3928451.192931417.1724973057.98127>]}, #Reference<0.192931417.1724907521.98126>, #Reference<0.192931417.1724907527.76712>}, {{#PID<0.30692.0>, [:alias | #Reference<0.0.3928579.192931417.1724973062.86576>]}, #Reference<0.192931417.1724907526.86575>, #Reference<0.192931417.1724907527.76711>}, {{#PID<0.30690.0>, [:alias | #Reference<0.0.3928323.192931417.1724973061.85267>]}, #Reference<0.192931417.1724907525.85266>, #Reference<0.192931417.1724907527.76708>}, {{#PID<0.30664.0>, [:alias | #Reference<0.0.3924995.192931417.1724973063.76705>]}, #Reference<0.192931417.1724907527.76704>, #Reference<0.192931417.1724907527.76707>}, {{#PID<0.30688.0>, [:alias | #Reference<0.0.3928067.192931417.1724973060.89037>]}, #Reference<0.192931417.1724907524.89036>, #Reference<0.192931417.1724907527.76706>}, {{#PID<0.30687.0>, [:alias | #Reference<0.0.3927939.192931417.1724973061.85254>]}, #Reference<0.192931417.1724907525.85253>, #Reference<0.192931417.1724907528.73877>}, {{#PID<0.30689.0>, [:alias | #Reference<0.0.3928195.192931417.1724973062.86550>]}, #Reference<0.192931417.1724907526.86549>, #Reference<0.192931417.1724907521.98095>}, {{#PID<0.30686.0>, [:alias | #Reference<0.0.3927811.192931417.1724973061.85235>]}, #Reference<0.192931417.1724907525.85234>, #Reference<0.192931417.1724907521.98094>}, {{#PID<0.30685.0>, [:alias | #Reference<0.0.3927683.192931417.1724973062.86522>]}, #Reference<0.192931417.1724907526.86521>, #Reference<0.192931417.1724907523.85084>} ], [ {{#PID<0.30672.0>, [:alias | #Reference<0.0.3926019.192931417.1724973059.85063>]}, #Reference<0.192931417.1724907523.85062>, #Reference<0.192931417.1724907523.85083>} ]}, #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo} iex(9)> :sys.get_state(pid(0,462,0)) {:state, #PID<0.463.0>, [], {[ {{#PID<0.40217.0>, [:alias | #Reference<0.0.953491.192931417.1724973060.124990>]}, #Reference<0.192931417.1724907524.124989>, #Reference<0.192931417.1724907524.125000>}, {{#PID<0.40218.0>, [:alias | #Reference<0.0.953619.192931417.1724973060.124974>]}, #Reference<0.192931417.1724907524.124973>, #Reference<0.192931417.1724907524.124997>}, {{#PID<0.40216.0>, [:alias | #Reference<0.0.953363.192931417.1724973057.131528>]}, #Reference<0.192931417.1724907521.131527>, #Reference<0.192931417.1724907524.124996>}, {{#PID<0.40215.0>, [:alias | #Reference<0.0.953235.192931417.1724973059.119283>]}, #Reference<0.192931417.1724907523.119282>, #Reference<0.192931417.1724907524.124995>}, {{#PID<0.40202.0>, [:alias | #Reference<0.0.951571.192931417.1724973061.121546>]}, #Reference<0.192931417.1724907525.121545>, #Reference<0.192931417.1724907524.124994>}, {{#PID<0.40200.0>, [:alias | #Reference<0.0.951315.192931417.1724973061.121533>]}, #Reference<0.192931417.1724907525.121532>, #Reference<0.192931417.1724907524.124993>}, {{#PID<0.40204.0>, [:alias | #Reference<0.0.951827.192931417.1724973060.124951>]}, #Reference<0.192931417.1724907524.124950>, #Reference<0.192931417.1724907524.124992>}, {{#PID<0.40214.0>, [:alias | #Reference<0.0.953107.192931417.1724973062.120287>]}, #Reference<0.192931417.1724907526.120286>, #Reference<0.192931417.1724907528.108633>}, {{#PID<0.40213.0>, [:alias | #Reference<0.0.952979.192931417.1724973059.119251>]}, #Reference<0.192931417.1724907523.119250>, #Reference<0.192931417.1724907528.108632>}, {{#PID<0.40212.0>, [:alias | #Reference<0.0.952851.192931417.1724973057.131489>]}, #Reference<0.192931417.1724907521.131488>, #Reference<0.192931417.1724907528.108631>}, {{#PID<0.40211.0>, [:alias | #Reference<0.0.952723.192931417.1724973063.111480>]}, #Reference<0.192931417.1724907527.111479>, #Reference<0.192931417.1724907528.108621>}, {{#PID<0.40210.0>, [:alias | #Reference<0.0.952595.192931417.1724973064.108613>]}, #Reference<0.192931417.1724907528.108612>, #Reference<0.192931417.1724907528.108614>}, {{#PID<0.40209.0>, [:alias | #Reference<0.0.952467.192931417.1724973057.131474>]}, #Reference<0.192931417.1724907521.131473>, #Reference<0.192931417.1724907521.131476>} ], [ {{#PID<0.40193.0>, [:alias | #Reference<0.0.950419.192931417.1724973058.133850>]}, #Reference<0.192931417.1724907522.133849>, #Reference<0.192931417.1724907521.131423>}, {{#PID<0.40201.0>, [:alias | #Reference<0.0.951443.192931417.1724973064.108585>]}, #Reference<0.192931417.1724907528.108584>, #Reference<0.192931417.1724907521.131444>}, {{#PID<0.40203.0>, [:alias | #Reference<0.0.951699.192931417.1724973057.131443>]}, #Reference<0.192931417.1724907521.131442>, #Reference<0.192931417.1724907521.131445>}, {{#PID<0.40205.0>, [:alias | #Reference<0.0.951955.192931417.1724973063.111448>]}, #Reference<0.192931417.1724907527.111447>, #Reference<0.192931417.1724907521.131460>}, {{#PID<0.40207.0>, [:alias | #Reference<0.0.952211.192931417.1724973060.124916>]}, #Reference<0.192931417.1724907524.124915>, #Reference<0.192931417.1724907521.131461>}, {{#PID<0.40208.0>, [:alias | #Reference<0.0.952339.192931417.1724973063.111464>]}, #Reference<0.192931417.1724907527.111463>, #Reference<0.192931417.1724907521.131475>} ]}, #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo} iex(10)> :sys.get_state(pid(0,462,0)) {:state, #PID<0.463.0>, [], {[ {{#PID<0.53151.0>, [:alias | #Reference<0.0.2609043.192931417.1724973062.167102>]}, #Reference<0.192931417.1724907526.167101>, #Reference<0.192931417.1724907522.180080>}, {{#PID<0.53150.0>, [:alias | #Reference<0.0.2608915.192931417.1724973058.180073>]}, #Reference<0.192931417.1724907522.180072>, #Reference<0.192931417.1724907522.180079>}, {{#PID<0.53147.0>, [:alias | #Reference<0.0.2608531.192931417.1724973062.167086>]}, #Reference<0.192931417.1724907526.167085>, #Reference<0.192931417.1724907522.180078>}, {{#PID<0.53130.0>, [:alias | #Reference<0.0.2606355.192931417.1724973058.180059>]}, #Reference<0.192931417.1724907522.180058>, #Reference<0.192931417.1724907522.180077>}, {{#PID<0.53146.0>, [:alias | #Reference<0.0.2608403.192931417.1724973062.167060>]}, #Reference<0.192931417.1724907526.167059>, #Reference<0.192931417.1724907522.180076>}, {{#PID<0.53149.0>, [:alias | #Reference<0.0.2608787.192931417.1724973064.157754>]}, #Reference<0.192931417.1724907528.157753>, #Reference<0.192931417.1724907522.180075>}, {{#PID<0.53148.0>, [:alias | #Reference<0.0.2608659.192931417.1724973057.178486>]}, #Reference<0.192931417.1724907521.178485>, #Reference<0.192931417.1724907522.180074>}, {{#PID<0.53145.0>, [:alias | #Reference<0.0.2608275.192931417.1724973063.159113>]}, #Reference<0.192931417.1724907527.159112>, #Reference<0.192931417.1724907527.159132>}, {{#PID<0.53144.0>, [:alias | #Reference<0.0.2608147.192931417.1724973064.157735>]}, #Reference<0.192931417.1724907528.157734>, #Reference<0.192931417.1724907527.159131>}, {{#PID<0.53143.0>, [:alias | #Reference<0.0.2608019.192931417.1724973057.178467>]}, #Reference<0.192931417.1724907521.178466>, #Reference<0.192931417.1724907527.159130>}, {{#PID<0.53140.0>, [:alias | #Reference<0.0.2607635.192931417.1724973062.167033>]}, #Reference<0.192931417.1724907526.167032>, #Reference<0.192931417.1724907527.159129>}, {{#PID<0.53138.0>, [:alias | #Reference<0.0.2607379.192931417.1724973063.159094>]}, #Reference<0.192931417.1724907527.159093>, #Reference<0.192931417.1724907527.159128>}, {{#PID<0.53133.0>, [:alias | #Reference<0.0.2606739.192931417.1724973062.167020>]}, #Reference<0.192931417.1724907526.167019>, #Reference<0.192931417.1724907527.159127>}, {{#PID<0.53137.0>, [:alias | #Reference<0.0.2607251.192931417.1724973063.159081>]}, #Reference<0.192931417.1724907527.159080>, #Reference<0.192931417.1724907527.159126>}, {{#PID<0.53142.0>, [:alias | #Reference<0.0.2607891.192931417.1724973057.178448>]}, #Reference<0.192931417.1724907521.178447>, #Reference<0.192931417.1724907527.159125>}, {{#PID<0.53139.0>, [:alias | #Reference<0.0.2607507.192931417.1724973064.157715>]}, #Reference<0.192931417.1724907528.157714>, #Reference<0.192931417.1724907527.159124>}, {{#PID<0.53141.0>, [:alias | #Reference<0.0.2607763.192931417.1724973060.172514>]}, #Reference<0.192931417.1724907524.172513>, #Reference<0.192931417.1724907527.159123>}, {{#PID<0.53136.0>, [:alias | #Reference<0.0.2607123.192931417.1724973063.159068>]}, #Reference<0.192931417.1724907527.159067>, #Reference<0.192931417.1724907527.159122>}, {{#PID<0.53135.0>, [:alias | #Reference<0.0.2606995.192931417.1724973063.159055>]}, #Reference<0.192931417.1724907527.159054>, #Reference<0.192931417.1724907527.159121>}, {{#PID<0.53134.0>, [:alias | #Reference<0.0.2606867.192931417.1724973063.159018>]}, #Reference<0.192931417.1724907527.159017>, #Reference<0.192931417.1724907527.159118>}, {{#PID<0.53132.0>, [:alias | #Reference<0.0.2606611.192931417.1724973063.158998>]}, #Reference<0.192931417.1724907527.158997>, #Reference<0.192931417.1724907527.159116>}, {{#PID<0.53131.0>, [:alias | #Reference<0.0.2606483.192931417.1724973057.178415>]}, #Reference<0.192931417.1724907521.178414>, #Reference<0.192931417.1724907527.159114>}, {{#PID<0.53129.0>, [:alias | #Reference<0.0.2606227.192931417.1724973061.166048>]}, #Reference<0.192931417.1724907525.166047>, #Reference<0.192931417.1724907525.166050>}, {{#PID<0.53128.0>, [:alias | #Reference<0.0.2606099.192931417.1724973062.166969>]}, #Reference<0.192931417.1724907526.166968>, #Reference<0.192931417.1724907525.166049>}, {{#PID<0.53127.0>, [:alias | #Reference<0.0.2605971.192931417.1724973061.166034>]}, #Reference<0.192931417.1724907525.166033>, #Reference<0.192931417.1724907525.166035>}, {{#PID<0.53125.0>, [:alias | #Reference<0.0.2605715.192931417.1724973057.178391>]}, #Reference<0.192931417.1724907521.178390>, #Reference<0.192931417.1724907525.166009>}, {{#PID<0.53126.0>, [:alias | #Reference<0.0.2605843.192931417.1724973057.178378>]}, #Reference<0.192931417.1724907521.178377>, #Reference<0.192931417.1724907525.166008>}, {{#PID<0.53121.0>, [:alias | #Reference<0.0.2605203.192931417.1724973057.178359>]}, #Reference<0.192931417.1724907521.178358>, #Reference<0.192931417.1724907525.166007>}, {{#PID<0.53120.0>, [:alias | #Reference<0.0.2605075.192931417.1724973057.178346>]}, #Reference<0.192931417.1724907521.178345>, #Reference<0.192931417.1724907525.166006>}, {{#PID<0.53124.0>, [:alias | #Reference<0.0.2605587.192931417.1724973059.165475>]}, #Reference<0.192931417.1724907523.165474>, #Reference<0.192931417.1724907525.166005>}, {{#PID<0.53123.0>, [:alias | #Reference<0.0.2605459.192931417.1724973059.165462>]}, #Reference<0.192931417.1724907523.165461>, #Reference<0.192931417.1724907526.166938>}, {{#PID<0.53122.0>, [:alias | #Reference<0.0.2605331.192931417.1724973062.166932>]}, #Reference<0.192931417.1724907526.166931>, #Reference<0.192931417.1724907526.166933>}, {{#PID<0.53119.0>, [:alias | #Reference<0.0.2604947.192931417.1724973059.165428>]}, #Reference<0.192931417.1724907523.165427>, #Reference<0.192931417.1724907526.166914>}, {{#PID<0.53118.0>, [:alias | #Reference<0.0.2604819.192931417.1724973058.180000>]}, #Reference<0.192931417.1724907522.179999>, #Reference<0.192931417.1724907526.166908>}, {{#PID<0.53117.0>, [:alias | #Reference<0.0.2604691.192931417.1724973064.157672>]}, #Reference<0.192931417.1724907528.157671>, #Reference<0.192931417.1724907526.166907>}, {{#PID<0.53116.0>, [:alias | #Reference<0.0.2604563.192931417.1724973062.166900>]}, #Reference<0.192931417.1724907526.166899>, #Reference<0.192931417.1724907526.166901>}, {{#PID<0.53115.0>, [:alias | #Reference<0.0.2604435.192931417.1724973062.166879>]}, #Reference<0.192931417.1724907526.166878>, #Reference<0.192931417.1724907526.166881>}, {{#PID<0.53113.0>, [:alias | #Reference<0.0.2604179.192931417.1724973062.166858>]}, #Reference<0.192931417.1724907526.166857>, #Reference<0.192931417.1724907526.166860>}, {{#PID<0.53114.0>, [:alias | #Reference<0.0.2604307.192931417.1724973058.179975>]}, #Reference<0.192931417.1724907522.179974>, #Reference<0.192931417.1724907526.166859>}, {{#PID<0.53112.0>, [:alias | #Reference<0.0.2604051.192931417.1724973063.158948>]}, #Reference<0.192931417.1724907527.158947>, #Reference<0.192931417.1724907527.158949>}, {{#PID<0.53111.0>, [:alias | #Reference<0.0.2603923.192931417.1724973059.165382>]}, #Reference<0.192931417.1724907523.165381>, #Reference<0.192931417.1724907523.165384>}, {{#PID<0.53110.0>, [...]}, #Reference<0.192931417.1724907527.158928>, #Reference<0.192931417.1724907526.166834>}, {{#PID<0.53109.0>, ...}, #Reference<0.192931417.1724907522.179945>, ...} ], [ {{#PID<0.53099.0>, [:alias | #Reference<0.0.2602387.192931417.1724973063.158896>]}, #Reference<0.192931417.1724907527.158895>, #Reference<0.192931417.1724907528.157625>}, {{#PID<0.53096.0>, [:alias | #Reference<0.0.2602003.192931417.1724973058.179914>]}, #Reference<0.192931417.1724907522.179913>, #Reference<0.192931417.1724907528.157626>}, {{#PID<0.53102.0>, [:alias | #Reference<0.0.2602771.192931417.1724973060.172418>]}, #Reference<0.192931417.1724907524.172417>, #Reference<0.192931417.1724907528.157627>}, {{#PID<0.53098.0>, [:alias | #Reference<0.0.2602259.192931417.1724973064.157624>]}, #Reference<0.192931417.1724907528.157623>, #Reference<0.192931417.1724907528.157628>}, {{#PID<0.53107.0>, [:alias | #Reference<0.0.2603411.192931417.1724973063.158915>]}, #Reference<0.192931417.1724907527.158914>, #Reference<0.192931417.1724907528.157635>}, {{#PID<0.53108.0>, [:alias | #Reference<0.0.2603539.192931417.1724973058.179933>]}, #Reference<0.192931417.1724907522.179932>, #Reference<0.192931417.1724907527.158916>} ]}, #Reference<0.192931417.1724776450.239958>, 4, 0, 0, :lifo} # queue keeps growing ```

ruslandoga commented 10 months ago

I've been able to reproduce the problem outside of Plausible with this script

1..1_000_000
|> Task.async_stream(
  fn i ->
    if rem(i, 666) == 0 do # we need sampling
      {t, r} =
        :timer.tc(fn ->
          Hammer.check_rate("ingest:site:loadtest.site", 60_000, 1_000_000)
        end)

      # since IO.inspect is too slow and allows Poolboy to clear the queue
      IO.inspect(t, label: i)
      r
    else
      # and the "actual overload" happens in this "fast" branch
      Hammer.check_rate("ingest:site:loadtest.site", 60_000, 1_000_000)
    end
  end,
  ordered: false,
  max_concurrency: 500
)
|> Stream.run()

You might need to play with max_concurrency: 500 and rem(i, 666) numbers to make Poolboy overload gradually like with k6 benchmark. On my system, these recreate the Plausible/k6 situation:

MIX_ENV=bench mix run bench/ets.exs

```console $ MIX_ENV=bench mix run bench/ets.exs 666: 10640 1332: 9843 1998: 10329 2664: 9764 3330: 9601 3996: 9598 4662: 9668 5328: 9592 5994: 9869 6660: 9605 7326: 9553 7992: 10193 8658: 9490 9324: 9851 9990: 9688 10656: 9733 11322: 9929 11988: 9842 12654: 9832 13320: 9521 13986: 9608 14652: 9387 15318: 9618 15984: 9402 16650: 9579 17316: 9761 17982: 9547 18648: 9439 19314: 9503 19980: 9675 20646: 9821 21312: 9391 21978: 9632 22644: 9607 23310: 9634 23976: 9400 24642: 9635 25308: 9517 25974: 10034 26640: 9520 27306: 9399 27972: 9095 28638: 9272 29304: 9435 29970: 9335 30636: 9372 31302: 9611 31968: 9550 32634: 9505 33300: 9482 33966: 9739 34632: 9615 35298: 9393 35964: 9445 36630: 9696 37296: 9831 37962: 9673 38628: 9651 39294: 9485 39960: 9099 40626: 9274 41292: 9237 41958: 10153 42624: 9428 43290: 9677 43956: 10819 44622: 9662 45288: 9622 45954: 9602 46620: 10600 47286: 9745 47952: 10368 48618: 9316 49284: 9982 49950: 10506 50616: 10758 51282: 10616 51948: 9924 52614: 10896 53280: 11025 53946: 11228 54612: 11000 55278: 10154 55944: 10853 56610: 10635 57276: 10953 57942: 11043 58608: 11334 59274: 10712 59940: 11052 60606: 10729 61272: 11144 61938: 11044 62604: 11352 63270: 11367 63936: 11474 64602: 11299 65268: 11428 65934: 11314 66600: 11238 67266: 11725 67932: 13145 68598: 13287 69264: 15812 69930: 15726 70596: 18159 71262: 24556 71928: 21276 72594: 22620 73260: 22422 73926: 26000 74592: 18282 75258: 17352 75924: 18440 76590: 18035 77256: 18177 77922: 14600 78588: 14917 79254: 14584 79920: 12567 80586: 12771 81252: 12515 81918: 12365 82584: 12992 83250: 12484 83916: 12385 84582: 12167 85248: 12868 85914: 12587 86580: 12619 87246: 12797 87912: 12460 88578: 14704 89244: 14084 89910: 13837 90576: 12096 91242: 13144 91908: 12912 92574: 12784 93240: 14351 93906: 14362 94572: 13985 95238: 16149 95904: 12922 96570: 13199 97236: 14150 97902: 12705 98568: 12364 99234: 12875 99900: 12469 100566: 12425 101232: 12790 101898: 12651 102564: 12480 103230: 13137 103896: 12540 104562: 12674 105228: 12750 105894: 12849 106560: 13044 107226: 12368 107892: 12484 108558: 12637 109224: 12599 109890: 12431 110556: 12481 111222: 12363 111888: 12622 112554: 12736 113220: 11264 113886: 12778 114552: 11255 115218: 10674 115884: 12293 116550: 11177 117216: 10788 117882: 10859 118548: 11043 119214: 10980 119880: 10863 120546: 10707 121212: 10598 121878: 10976 122544: 10641 123210: 11008 123876: 11317 124542: 11247 125208: 11058 125874: 12143 126540: 11282 127206: 10902 127872: 11642 128538: 10766 129204: 11731 ^C ```

Note that this can't be reproduced in Benchee since it waits for each request to complete which allows Poolboy to keep the queue empty.

epinault commented 10 months ago

Hello!

thanks for the detail report. I took over the project and there is bunch of work I want to try to find time to work on eventually. I have not run into this issue yet but yea the perf is either pool boy or ETS because we bypass the Genserver in the latest version and directly talk to ETS.

That being Said, Pool boy is configurable. Have you tried tweaking it and make it bigger just to see how it behaves?

Removing poolboy might be tricky because we do need for solution like REDIS and other backend so having a pool in front is helpful. We could consider removing the pool and see what happens ? One thing is that.

ETS is not shared accross VM so not sure how many people use it because you will not be able to do rate limiting with multiple pods correctly (or it will have to be approximate and set a value that is pod number dependent)

I ll see if I can spend some time in the coming weeks but if you have an MR or some idea, feel free to submit it as well.

ruslandoga commented 10 months ago

👋

That being Said, Pool boy is configurable. Have you tried tweaking it and make it bigger just to see how it behaves?

That would just delay the inevitable, I think. As https://github.com/ExHammer/hammer/issues/71#issuecomment-1822702957 shows the overload would happen anyway once Poolboy stops keeping up, the pool size largely doesn't matter unless it matches the number of clients.

Removing poolboy might be tricky because we do need for solution like REDIS and other backend so having a pool in front is helpful. We could consider removing the pool and see what happens ? One thing is that.

The pooling strategy could be moved to the adapter, the Ecto way.

ETS is not shared accross VM so not sure how many people use it

According to hex.pm's recent downloads, 119339 (~one third) if them are probably for the ETS backend.

you will not be able to do rate limiting with multiple pods correctly

It could be actually worked around with sticky sessions or Phx PubSub

I ll see if I can spend some time in the coming weeks but if you have an MR or some idea, feel free to submit it as well.

I'm happy to work on this since it seems to directly affect Plausible, I just wanted to check with the project maintainers first :)

ruslandoga commented 10 months ago

I've started exploring various approaches and the current fastest one is PlugAttack: https://github.com/ruslandoga/rate_limit Atomics seem promising as well. Maybe :hammer can provide two backends by default, ETS (based off of PlugAttack) and Atomics.

epinault commented 10 months ago

I am not too familiar with Atomics yet but sounds like a good idea.

I am not against moving the pooling strategy, we will need to create a major version correctly for it and all of the backend then

epinault commented 7 months ago

@ruslandoga are you still working on splitting the MR? there are changes I was you had that I was going to tackle but if you are not then I ll start working on them

ruslandoga commented 7 months ago

Hi @epinault ! I'm currently not working on this. I do plan to return to it this month though.

ruslandoga commented 4 months ago

Sorry for disappearing and not keeping my promise :) I'm going to continue working on my PRs ~~later this week.~~ ~~in August.~~ later this year!

ExHammer / hammer

Performance degradation over time with ETS backend #71