elfo-rs / elfo

An asynchronous distributed actor framework in Rust with robust observability
217 stars 12 forks source link

Draft: feat: prioritise UpdateConfig messages #88

Closed laplab closed 1 year ago

laplab commented 1 year ago

Benchmarks on my Mac are a little bit all over the place. I will try to get more consistent results on a Linux machine.

Comparison on Mac ``` $ critcmp master patch group master patch ----- ------ ----- only_command/one_to_one/one_instance/1 1.00 315.2±128.71ns 3.0 MElem/sec 1.37 432.3±235.58ns 2.2 MElem/sec only_command/one_to_one/one_instance/10 1.00 3.0±1.74µs 3.2 MElem/sec 1.14 3.4±1.84µs 2.8 MElem/sec only_command/one_to_one/one_instance/11 1.00 2.9±1.37µs 3.7 MElem/sec 1.38 3.9±2.04µs 2.7 MElem/sec only_command/one_to_one/one_instance/12 1.65 6.7±1.92µs 1760.5 KElem/sec 1.00 4.0±2.21µs 2.8 MElem/sec only_command/one_to_one/one_instance/2 1.00 780.4±297.81ns 2.4 MElem/sec 1.10 857.3±434.40ns 2.2 MElem/sec only_command/one_to_one/one_instance/3 1.00 838.4±252.47ns 3.4 MElem/sec 1.24 1039.4±408.89ns 2.8 MElem/sec only_command/one_to_one/one_instance/4 1.08 1448.4±391.31ns 2.6 MElem/sec 1.00 1344.5±678.08ns 2.8 MElem/sec only_command/one_to_one/one_instance/5 1.18 2.0±0.60µs 2.4 MElem/sec 1.00 1711.0±626.59ns 2.8 MElem/sec only_command/one_to_one/one_instance/6 1.04 2.0±0.87µs 2.8 MElem/sec 1.00 1966.1±820.88ns 2.9 MElem/sec only_command/one_to_one/one_instance/7 1.24 2.4±1.33µs 2.7 MElem/sec 1.00 1962.0±976.26ns 3.4 MElem/sec only_command/one_to_one/one_instance/8 1.00 2.4±1.12µs 3.1 MElem/sec 1.08 2.6±1.34µs 2.9 MElem/sec only_command/one_to_one/one_instance/9 1.00 2.8±1.34µs 3.1 MElem/sec 1.09 3.0±2.16µs 2.9 MElem/sec only_command/round_robin/one_instance/1 1.00 400.9±205.79ns 2.4 MElem/sec 1.03 414.9±240.49ns 2.3 MElem/sec only_command/round_robin/one_instance/10 1.11 8.2±2.51µs 1187.5 KElem/sec 1.00 7.4±1.82µs 1321.3 KElem/sec only_command/round_robin/one_instance/11 1.12 8.7±3.22µs 1229.2 KElem/sec 1.00 7.8±2.24µs 1373.7 KElem/sec only_command/round_robin/one_instance/12 1.05 8.5±2.52µs 1379.5 KElem/sec 1.00 8.1±2.38µs 1454.2 KElem/sec only_command/round_robin/one_instance/2 1.08 1124.4±619.92ns 1737.0 KElem/sec 1.00 1037.4±508.90ns 1882.8 KElem/sec only_command/round_robin/one_instance/3 1.00 1310.8±501.58ns 2.2 MElem/sec 1.57 2.1±1.50µs 1427.2 KElem/sec only_command/round_robin/one_instance/4 1.00 2.5±0.95µs 1593.9 KElem/sec 1.05 2.6±0.88µs 1521.4 KElem/sec only_command/round_robin/one_instance/5 1.00 3.1±1.16µs 1577.2 KElem/sec 1.06 3.3±1.31µs 1487.0 KElem/sec only_command/round_robin/one_instance/6 1.09 4.6±1.48µs 1283.1 KElem/sec 1.00 4.2±1.35µs 1397.1 KElem/sec only_command/round_robin/one_instance/7 1.08 5.5±1.53µs 1252.6 KElem/sec 1.00 5.1±1.76µs 1347.6 KElem/sec only_command/round_robin/one_instance/8 1.08 6.4±2.07µs 1222.8 KElem/sec 1.00 5.9±1.37µs 1324.5 KElem/sec only_command/round_robin/one_instance/9 1.05 7.3±1.99µs 1208.1 KElem/sec 1.00 6.9±1.65µs 1265.8 KElem/sec ```
laplab commented 1 year ago

For context, this is a version with a separate high-priority message channel. It has likely / unlikely put in (hopefully) the right places and includes a slightly biased reimplementation of tokio::select!.

On Linux, the results are a little bit more decisive. 9 out of 24 benchmarks are improved and 15 are regressed. I expected all of the benchmarks to be in regression, it is a little bit strange to see 9 of them improved.

12 out of 15 regressions are less than 10%. I am hesitant to investigate them more closely, since this can be noise.

The 3 remaining regressions are:

I think it also makes sense to quickly implement an idea with one simple Option for UpdateConfig message instead of a separate channel. It looks like the most efficient way to implement the whole thing in terms of overhead, so it can be an interesting baseline to compare against.

Comparison on Linux ``` $ critcmp master patch group master patch ----- ------ ----- only_command/one_to_one/one_instance/1 1.00 270.4±27.97ns 3.5 MElem/sec 1.07 289.3±44.30ns 3.3 MElem/sec only_command/one_to_one/one_instance/10 1.00 2.0±0.69µs 4.7 MElem/sec 1.25 2.5±0.76µs 3.8 MElem/sec only_command/one_to_one/one_instance/11 1.28 2.8±0.78µs 3.8 MElem/sec 1.00 2.2±0.70µs 4.9 MElem/sec only_command/one_to_one/one_instance/12 1.00 2.6±0.91µs 4.4 MElem/sec 1.02 2.7±0.86µs 4.3 MElem/sec only_command/one_to_one/one_instance/2 1.00 728.9±249.20ns 2.6 MElem/sec 1.04 755.6±526.09ns 2.5 MElem/sec only_command/one_to_one/one_instance/3 1.00 874.8±349.32ns 3.3 MElem/sec 1.25 1090.8±341.83ns 2.6 MElem/sec only_command/one_to_one/one_instance/4 1.00 1328.2±565.91ns 2.9 MElem/sec 1.07 1423.8±359.60ns 2.7 MElem/sec only_command/one_to_one/one_instance/5 1.16 1664.7±347.05ns 2.9 MElem/sec 1.00 1435.3±501.49ns 3.3 MElem/sec only_command/one_to_one/one_instance/6 1.11 1865.5±582.87ns 3.1 MElem/sec 1.00 1686.1±474.22ns 3.4 MElem/sec only_command/one_to_one/one_instance/7 1.00 1815.5±526.91ns 3.7 MElem/sec 1.07 1943.1±569.04ns 3.4 MElem/sec only_command/one_to_one/one_instance/8 1.16 2.1±0.61µs 3.6 MElem/sec 1.00 1809.4±591.96ns 4.2 MElem/sec only_command/one_to_one/one_instance/9 1.10 2.3±0.69µs 3.7 MElem/sec 1.00 2.1±0.72µs 4.1 MElem/sec only_command/round_robin/one_instance/1 1.06 275.9±38.54ns 3.5 MElem/sec 1.00 260.5±27.95ns 3.7 MElem/sec only_command/round_robin/one_instance/10 1.00 4.1±1.07µs 2.3 MElem/sec 1.06 4.3±1.53µs 2.2 MElem/sec only_command/round_robin/one_instance/11 1.00 4.3±1.08µs 2.4 MElem/sec 1.09 4.7±1.23µs 2.2 MElem/sec only_command/round_robin/one_instance/12 1.06 5.4±1.10µs 2.1 MElem/sec 1.00 5.1±1.20µs 2.2 MElem/sec only_command/round_robin/one_instance/2 1.01 900.4±295.52ns 2.1 MElem/sec 1.00 894.1±256.61ns 2.1 MElem/sec only_command/round_robin/one_instance/3 1.00 1324.4±441.49ns 2.2 MElem/sec 1.12 1483.5±474.68ns 1974.8 KElem/sec only_command/round_robin/one_instance/4 1.00 1728.4±577.65ns 2.2 MElem/sec 1.07 1856.1±539.29ns 2.1 MElem/sec only_command/round_robin/one_instance/5 1.00 2.4±0.49µs 2011.0 KElem/sec 1.01 2.4±0.65µs 1998.3 KElem/sec only_command/round_robin/one_instance/6 1.05 2.8±0.58µs 2.1 MElem/sec 1.00 2.6±0.80µs 2.2 MElem/sec only_command/round_robin/one_instance/7 1.00 3.4±0.71µs 2035.2 KElem/sec 1.03 3.5±0.77µs 1970.3 KElem/sec only_command/round_robin/one_instance/8 1.00 3.8±0.72µs 2.0 MElem/sec 1.02 3.9±0.86µs 2028.7 KElem/sec only_command/round_robin/one_instance/9 1.00 3.7±0.85µs 2.3 MElem/sec 1.09 4.0±0.82µs 2.1 MElem/sec ```
loyd commented 1 year ago

I think it also makes sense to quickly implement an idea with one simple Option for UpdateConfig message instead of a separate channel. It looks like the most efficient way to implement the whole thing in terms of overhead, so it can be an interesting baseline to compare against.

It also requires Notify or, in order to avoid dependence on tokio in this place, https://docs.rs/futures-intrusive/latest/futures_intrusive/sync/struct.GenericManualResetEvent.html).

laplab commented 1 year ago

After some offline discussion I have implemented prioritisation in the RingBuf, which is used by GenericChannel under the hood. The is very proof-of-concept version because it does not handle sending multiple priority messages at once and sometimes makes the benchmark panic with 'called before close()', elfo-core/src/mailbox.rs:154:42, which I was not able to debug just yet.

It seems that there are no regressions with more than 10% difference. In fact, the biggest regression is 8% and the others are below 5%.

Comparison on Linux ``` group master ring-buf-with-high-priority-handling ----- ------ ------------------------------------ only_command/one_to_one/one_instance/1 1.00 259.2±42.43ns 3.7 MElem/sec 1.04 269.1±42.38ns 3.5 MElem/sec only_command/one_to_one/one_instance/10 1.03 2.1±0.26µs 4.5 MElem/sec 1.00 2.0±0.20µs 4.7 MElem/sec only_command/one_to_one/one_instance/11 1.02 2.3±0.21µs 4.7 MElem/sec 1.00 2.2±0.23µs 4.7 MElem/sec only_command/one_to_one/one_instance/12 1.04 2.5±0.37µs 4.5 MElem/sec 1.00 2.4±0.31µs 4.7 MElem/sec only_command/one_to_one/one_instance/2 1.00 508.6±27.99ns 3.8 MElem/sec 1.00 506.7±63.41ns 3.8 MElem/sec only_command/one_to_one/one_instance/3 1.00 677.8±49.40ns 4.2 MElem/sec 1.01 682.4±60.73ns 4.2 MElem/sec only_command/one_to_one/one_instance/4 1.00 840.2±37.88ns 4.5 MElem/sec 1.08 904.1±104.62ns 4.2 MElem/sec only_command/one_to_one/one_instance/5 1.00 1075.5±106.85ns 4.4 MElem/sec 1.02 1093.6±127.95ns 4.4 MElem/sec only_command/one_to_one/one_instance/6 1.01 1265.7±113.99ns 4.5 MElem/sec 1.00 1249.6±124.89ns 4.6 MElem/sec only_command/one_to_one/one_instance/7 1.00 1433.3±122.53ns 4.7 MElem/sec 1.00 1431.0±140.89ns 4.7 MElem/sec only_command/one_to_one/one_instance/8 1.00 1604.9±131.59ns 4.8 MElem/sec 1.00 1612.1±118.77ns 4.7 MElem/sec only_command/one_to_one/one_instance/9 1.00 1820.4±140.71ns 4.7 MElem/sec 1.01 1839.5±205.77ns 4.7 MElem/sec only_command/round_robin/one_instance/1 1.00 235.2±43.39ns 4.1 MElem/sec 1.02 239.1±43.84ns 4.0 MElem/sec only_command/round_robin/one_instance/10 1.00 2.6±0.30µs 3.6 MElem/sec 1.04 2.7±0.34µs 3.5 MElem/sec only_command/round_robin/one_instance/11 1.01 3.0±0.33µs 3.5 MElem/sec 1.00 2.9±0.33µs 3.6 MElem/sec only_command/round_robin/one_instance/12 1.01 3.2±0.36µs 3.6 MElem/sec 1.00 3.1±0.32µs 3.6 MElem/sec only_command/round_robin/one_instance/2 1.03 762.7±109.27ns 2.5 MElem/sec 1.00 741.2±108.86ns 2.6 MElem/sec only_command/round_robin/one_instance/3 1.04 1026.5±229.12ns 2.8 MElem/sec 1.00 991.6±151.16ns 2.9 MElem/sec only_command/round_robin/one_instance/4 1.01 1214.6±180.42ns 3.1 MElem/sec 1.00 1198.5±146.86ns 3.2 MElem/sec only_command/round_robin/one_instance/5 1.04 1575.4±271.95ns 3.0 MElem/sec 1.00 1513.4±208.60ns 3.2 MElem/sec only_command/round_robin/one_instance/6 1.00 1754.5±219.91ns 3.3 MElem/sec 1.01 1777.5±227.89ns 3.2 MElem/sec only_command/round_robin/one_instance/7 1.00 1989.4±225.63ns 3.4 MElem/sec 1.04 2.1±0.31µs 3.2 MElem/sec only_command/round_robin/one_instance/8 1.01 2.3±0.37µs 3.3 MElem/sec 1.00 2.3±0.53µs 3.3 MElem/sec only_command/round_robin/one_instance/9 1.01 2.5±0.37µs 3.4 MElem/sec 1.00 2.5±0.28µs 3.5 MElem/sec ```
laplab commented 1 year ago

Closing this, since we decided that we are not quite ready to introduce message prioritisation just yet. We will be switching from request() to pure send() for UpdateConfig messages however, which should provide significant boost in config updates (which was the original intention behind this PR).