Open pranaypratyush opened 1 year ago
I made naive benchmarks to compare performance with Bastion over here: https://github.com/pranaypratyush/actor_bench_test
Currently, I am getting the following
Benchmarking Bastion/actor_creation: Collecting 100 samples in estimated 5.0053 s (975k iBastion/actor_creation time: [5.0795 µs 5.2497 µs 5.5823 µs]
change: [+0.1855% +2.8083% +5.4573%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
Benchmarking Bastion/send_message: Collecting 100 samples in estimated 5.0068 s (2.7M iteBastion/send_message time: [1.8115 µs 1.8151 µs 1.8199 µs]
change: [+5.2196% +5.9013% +6.7632%] (p = 0.00 < 0.05)
Performance has regressed.
Found 6 outliers among 100 measurements (6.00%)
2 (2.00%) high mild
4 (4.00%) high severe
Running benches/coerce_benches.rs (target/release/deps/coerce_benches-0a223657393e94b9)
Benchmarking actor_send_1000: Collecting 100 samples in estimated 5.1943 s (1700 iteratioactor_send_1000 time: [2.9886 ms 3.0266 ms 3.0693 ms]
change: [-50.216% -49.388% -48.563%] (p = 0.00 < 0.05)
Performance has improved.
Found 18 outliers among 100 measurements (18.00%)
6 (6.00%) high mild
12 (12.00%) high severe
Benchmarking actor_notify_1000: Collecting 100 samples in estimated 5.2457 s (30k iteratiactor_notify_1000 time: [229.26 µs 249.90 µs 266.92 µs]
change: [-19.567% -12.271% -4.6701%] (p = 0.00 < 0.05)
Performance has improved.
Benchmarking create_1000_actors: Collecting 100 samples in estimated 5.1056 s (1000 iteracreate_1000_actors time: [5.0030 ms 5.1304 ms 5.2710 ms]
change: [+2.2920% +4.8187% +7.9480%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7 outliers among 100 measurements (7.00%)
2 (2.00%) high mild
5 (5.00%) high severe
Please note that Bastion is doing 1 iteration whereas Coerce is doing 1000, Bastion is taking way too much memory doing much lesser and doesn't spread the load to all cores, while Coerce looks memory efficient in comparison (not sure how much or is that even sensible efficiency) and spreads the load somewhat evenly. Also that this benchmark is probably flawed.
Hi @pranaypratyush,
The benchmarks included in this repository are in no way indicative of real-world performance, and were added only as a quick and dirty way to detect performance regressions with the Coerce library itself.
The framework (and actor model as a whole) shines when you have many actors working concurrently, rather than just 1 sending and receiving sequentially. I'll look at adding in some better performance benchmarks soon that will give a clearer picture of how Coerce will perform in the real world.
Thanks a lot!
Sorry, didn't mean to close the issue!
Yes, I am aware that the simple benchmarks you added are too naive to represent anything useful on it's own but are merely there to help you catch some obvious performance regressions. My benchmarks are naive as well but I would keep working on them. Helps me learn I just added the following to my repo
fn actor_send_receive_on_current_thread_1000_benchmark(c: &mut Criterion) {
// let runtime = rt();
c.bench_function("actor_send_receive_on_current_thread_1000", |b| {
b.iter(|| async {
let local = tokio::task::LocalSet::new();
let send_receive_1000 = async move {
let actor = actor().await;
for _ in 0..1000 {
actor.send(Msg).await.unwrap();
}
};
local.spawn_local(send_receive_1000);
local.await;
});
});
}
async fn actor() -> LocalActorRef<BenchmarkActor> {
let system = ActorSystem::new();
system
.new_actor("actor".into_actor_id(), BenchmarkActor, Anonymous)
.await
.expect("unable to create actor")
}
And I get this for this bench
actor_send_receive_on_current_thread_1000
time: [3.3284 ns 3.3297 ns 3.3312 ns]
change: [-97.464% -97.461% -97.458%] (p = 0.00 < 0.05)
Maybe we can add some thread local stuff in coerce? Or perhaps some better examples of how to systematically use this for hot paths in a real project?
send_zst/1 time: [2.3800 µs 2.4019 µs 2.4330 µs]
change: [+1.1173% +3.3656% +6.4253%] (p = 0.00 < 0.05)
Performance has regressed.
Found 10 outliers among 100 measurements (10.00%)
4 (4.00%) high mild
6 (6.00%) high severe
send_zst/10 time: [3.3274 µs 3.3344 µs 3.3418 µs]
change: [-26.455% -20.847% -14.633%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) high mild
1 (1.00%) high severe
send_zst/100 time: [56.708 µs 56.788 µs 56.872 µs]
change: [-3.1032% -2.9136% -2.7247%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100 measurements (1.00%)
1 (1.00%) low mild
send_zst/1000 time: [476.38 µs 476.82 µs 477.27 µs]
change: [-12.620% -12.406% -12.209%] (p = 0.00 < 0.05)
Performance has improved.
Found 2 outliers among 100 measurements (2.00%)
1 (1.00%) low mild
1 (1.00%) high mild
Above is benchmarks from xtra. It also happens to be ridiculously memory efficient.
I ran benchmarks provided in
coerce
crate on my 5950x and got thisQuite surprised that it takes so much time. I am trying to build a social network where each post can be an orderbook, so a lot of orderbooks. I liked Coerce's API compared to something like Bastion, but this benchmark surprised me. Is this going to be representative latencies in the final web server or is this just happening because we are awaiting one after the other on a multi-threaded runtime, and Tokio is wasting too much time in the scheduler doing nothing useful at all?