Performance regression after updating to 2.5.0

sdil commented 2 months ago

Bug description

I'm building a demo web app using goakt. It spawns over 600 vehicle actors during its runtime.

With goakt version 2.4.0, I can get over 35k rps when running load test for the request to get the vehicle position. The memory usage is over 17GB.

> hey -c 100 "http://localhost:8080/vehicle?id=0012.02247" -z 30s
Summary:
  Success rate: 1.0000
  Total:    30.0036 secs
  Slowest:  0.4775 secs
  Fastest:  0.0000 secs
  Average:  0.0028 secs
  Requests/sec: 35629.9656

After updating to version 2.5.0, the number of rps dropped to only 3.5k. The memory usage is only <40MB (which is superb).

> hey -c 50 "http://localhost:8080/vehicle?id=0018.00222" -z 30s
Summary:
  Success rate: 1.0000
  Total:    30.0029 secs
  Slowest:  0.3449 secs
  Fastest:  0.0001 secs
  Average:  0.0139 secs
  Requests/sec: 3588.8494

How to reproduce it?

Clone my repo https://github.com/sdil/goakt-realtimemap
Run the server with go run . on first terminal
Run load test on /vehicle?id=<actor name> endpoint. I'm using https://github.com/rakyll/hey so the full command looks like this: hey -c 50 "http://localhost:8080/vehicle?id=<actor name>" -z 30s. You can get the actor name from the log in the terminal
Observe the throughput.

Change the goakt version to 2.4.0 and repeat the same process.

Expected behavior

The performance to remain around the same.

Screenshots

N/A

Library Version:

Go-Akt version: 2.5.0
Go version: go1.22.5 darwin/arm64

Additional context

When running with 1 actor, the performance is kinda similar. But when I spawn 10 actors, it started degrading significantly.

Tochemey commented 2 months ago

@sdil Thanks for the bug. I will take a look in the course of the week.

Tochemey commented 2 months ago

@sdil Do I need additional tools before running your sample?

sdil commented 2 months ago

I was using https://github.com/rakyll/hey to run load test but you can technically use any tools to do the same.

I'll update the reproduction step in the description

sdil commented 2 months ago

@Tochemey I've updated the step on how reproduce the bug

Tochemey commented 2 months ago

@Tochemey I've updated the step on how reproduce the bug

When I run go run . as stated I got the following log. Is it something that is expected?

cc @sdil

sdil commented 2 months ago

Yes that is expected. The program listen mqtt messages, creates actor and update its state based on the messages

Tochemey commented 2 months ago

@sdil I have been able to reproduce it. I will take a look at what may be the issue in the weekend. Thanks

Tochemey commented 2 months ago

@sdil I have added some small test to check actors message handling throughput as shown here:

=== RUN   TestBenchmark_BenchTell
starting benchmark for (30s): num workers:(50)
total actors spawned: (2000)
total workers: (50), total messages sent: (92645866), total messages received: (92645866) - duration: (30s)
messages per second: (3088195)
--- PASS: TestBenchmark_BenchTell (52.64s)
PASS

=== RUN   TestBenchmark_BenchAsk
starting benchmark for (30s): num workers:(50)
total actors spawned: (2000)
total workers: (50), total messages sent: (7489596), total messages received: (7489596) - duration: (30s)
messages per second: (249653)
--- PASS: TestBenchmark_BenchAsk (52.63s)
PASS

The test can be found here https://github.com/Tochemey/goakt/blob/issue-perf/bench/benchmark_test.go#L165

As you can see Askis not really performant compare to Tell

Tochemey commented 2 months ago

I will enhance this benchmark test with additional test cases. I am sure depending upon machine specs and other factors the result may vary.

sdil commented 2 months ago

Ahh the Tell operation is fantastic. I didn't realize that Ask is 1/10 of the throughput of Tell. That's a really great benchmark test.

Just curious, what is your machine spec?

Tochemey commented 2 months ago

Ahh the Tell operation is fantastic. I didn't realize that Ask is 1/10 of the throughput of Tell. That's a really great benchmark test.

Just curious, what is your machine spec?

13-inch M1 2020, 16GB mem, macOS

Tochemey commented 2 months ago

@sdil Apologies for the delay in figuring out what may be causing the performance drop. I welcome any idea to enhance performance and the library in general.

sdil commented 2 months ago

Did you consider MPSC queue as suggested by @halturin? It seems like the same queue implemented in Akka

Ref: https://concurrencyfreaks.blogspot.com/2014/04/multi-producer-single-consumer-queue.html

Tochemey commented 2 months ago

Did you consider MPSC queue as suggested by @halturin? It seems like the same queue implemented in Akka

Ref: https://concurrencyfreaks.blogspot.com/2014/04/multi-producer-single-consumer-queue.html

Thanks. I am going to give it a try :)

Tochemey commented 2 months ago

@sdil using the mpsc queue seems increase some throughput:

=== RUN   TestBenchmark_BenchTell
starting benchmark for (30s): num workers:(50)
total actors spawned: (2000)
total workers: (50), total messages sent: (105993035), total messages received: (105993035) - duration: (30s)
messages per second: (3533101)
--- PASS: TestBenchmark_BenchTell (52.73s)
PASS

halturin commented 2 months ago

@Tochemey in ergo, mpsc gives me 6M msg/sec between two local actors https://github.com/ergo-services/benchmarks (this perf depends on the HW for sure)

Tochemey commented 2 months ago

@Tochemey in ergo, mpsc gives me 6M msg/sec between two local actors https://github.com/ergo-services/benchmarks (this perf depends on the HW for sure)

my benchmarks are only for sending messages to an actor not even actor to actor :). Pretty sure something is wrong in my implementation.

halturin commented 2 months ago

@Tochemey i believe you can reach the same numbers easily

Tochemey commented 2 months ago

@Tochemey i believe you can reach the same numbers easily

I hope so :)

Tochemey commented 2 months ago

Also let me take a look at ergo, maybe I may abandon Go-Akt and redirect people to ergo as long as it does not require licensing :) because that is the goal of Go-Akt. cc @halturin

halturin commented 2 months ago

see v300 then :) its about to release. preparing the docs now. planned for coming September.

https://github.com/ergo-services/examples/tree/v300 - examples. see "demo" there (it has "observer" app embedded, open http://localhost:9911 to see the node/network/processes etc)

Tochemey commented 2 months ago

@halturin I appreciate the support to help my library to be better. I really do. I will be glad to contribute to ergo. I have few questions to ask:

Does ergo support actor redeployment in case network partition or cluster topology changes?
How does clustering work?
What are the current cluster provider engine integration?(k8, NATS, etc...)

halturin commented 2 months ago

What are the current cluster provider engine integration?(k8, NATS, etc...)

its in the roadmap now.

didnt get the first two.

maybe I may abandon Go-Akt and redirect people to ergo as long as it does not require licensing :) because that is the goal of Go-Akt.

please, don't. we might have different vision of "where to go/how to do".

sdil commented 2 months ago

Personally, I liked go-akt syntaxes because it feels closer to MS Orleans. I really liked that I can do scheduling, passivation, etc. I was abit struggled with proto messaging, but that's because I never used it. Now I understand the type checking it provides.

Ergo looks nice too but I haven't give it a try yet. Ergo looks closer to Erlang & OTP but I struggled with Elixir in the past because of its syntax and paradigm.

Tochemey commented 2 months ago

@sdil I am trying to see if I can match the performance of ergo :)

Tochemey commented 2 months ago

@sdil I have released 2.5.1 to include the mspc queue and few fixes. At the moment, Go-Akt is not yet close to ergo. Let us leave the issue open while I dig more in some the implementation and see where we can gain some more performance.

halturin commented 2 months ago

I noticed you used generics for your mpsc implementation. I believe, I tried this way and moved back to the 'any" type due to performance issues. But, honestly I had no enough time to dig into it.

sdil commented 2 months ago

Thank you for the release @Tochemey great work! I'm pretty sure we can find some other optimizations in the future 😀

Tochemey commented 2 months ago

Please let us leave the issue open.

Tochemey commented 2 months ago

I noticed you used generics for your mpsc implementation. I believe, I tried this way and moved back to the 'any" type due to performance issues. But, honestly I had no enough time to dig into it.

Interesting. Let me take your advice and see the outcome. Thanks for pointing out. @halturin I found this article https://planetscale.com/blog/generics-can-make-your-go-code-slower

Tochemey commented 2 months ago

@halturin I will be glad to have a chat with you to help fix some of the bottleneck that is hindering performance.

sdil commented 2 months ago

@Tochemey i was following a series of your development in these PRs

I remember i saw a huge performance gain from these PRs when I tested locally. just curious, why are they not merged?

Tochemey commented 2 months ago

@sdil I closed those PRs because some of the changes have been added into other PRs which has been merged. However, I have not fully tested the new changes I made. :)

Tochemey commented 2 months ago

@sdil you can reference the latest main commit and see. At the moment I am doing some cleaning and refactoring on the code base and checking where I can gain some performance.

Tochemey commented 2 months ago

@sdil I have released a new version https://github.com/Tochemey/goakt/releases/tag/v2.6.0

sdil commented 2 months ago

Ahh i saw that this morning. I'll give it a try later

sdil commented 2 months ago

@Tochemey I'm still seeing bad load test result here

fyaacob@M1622 ~> hey -c 50 "http://localhost:8080/vehicle?id=0022.00945" -z 30s
Summary:
  Success rate: 1.0000
  Total:    30.0016 secs
  Slowest:  1.7371 secs
  Fastest:  0.0005 secs
  Average:  0.4966 secs
  Requests/sec: 100.0948

  Total data:   1.62 MiB
  Size/request: 566 B
  Size/sec: 55.34 KiB

Plus the CPU is taking 900% now.

This is commit for updating the library https://github.com/sdil/goakt-realtimemap/commit/84143d7fd8168497a1753467306362573d6c5a29

Let me know if messed up with anything 🙏

Tochemey commented 2 months ago

@sdil I will have through your implementation and see what is going on.

Tochemey commented 2 months ago

@sdil I will have through your implementation and see what is going on.

This bench code here on my machine returns this output:

=== RUN   TestBenchmark_BenchAsk
Starting benchmark for (30s): num workers:(100)
Total workers: (100), total messages sent: (29025818), total messages received: (29025818) - duration: (30s)
Messages per second: (967527)
--- PASS: TestBenchmark_BenchAsk (32.53s)
PASS

It spawns 100 workers to send Ask to a given actor.

Tochemey commented 2 months ago

and this bench code here on my machine returns this output:

goos: darwin
goarch: arm64
pkg: github.com/tochemey/goakt/v2/bench
cpu: Apple M1
BenchmarkActor
BenchmarkActor/ask
BenchmarkActor/ask-8             1099460           984.2 ns/op     1035262 ops/s         577 B/op         10 allocs/op
PASS

Tochemey commented 2 months ago

@sdil I hope these give you some idea. I will look into your implementation when I get some spare time.

Tochemey commented 2 months ago

@sdil at first glance I can see some loop here constantly querying for actors. I don't know what you want to achieve though but this can be an issue.

sdil commented 2 months ago

I'm running the bench test on my machine and getting this

> make bench
cd bench && rm -f bench.txt && go test -run=^# -bench . -benchmem -count=30 -timeout=0 | tee bench.txt
goos: darwin
goarch: arm64
pkg: github.com/tochemey/goakt/v2/bench
cpu: Apple M1 Pro
BenchmarkActor/tell-10              3304     608753 ns/op    3865833 ops/s      312 B/op        6 allocs/op
BenchmarkActor/tell-10              3435     585545 ns/op    3740811 ops/s      301 B/op        6 allocs/op
BenchmarkActor/tell-10              3436     585312 ns/op    4126291 ops/s      308 B/op        6 allocs/op
BenchmarkActor/tell-10              3439     584775 ns/op    4174597 ops/s      301 B/op        6 allocs/op

I think something is wrong with my machine. I restarted my machine and it's still showing the same result

Tochemey commented 2 months ago

@sdil I will be glad to sync with you if you are free. :). I can send you an invite on linkedin

sdil commented 2 months ago

@Tochemey yup I'm free right now. this is my linkedin https://www.linkedin.com/in/fadhilyaacob/

Tochemey commented 2 months ago

@Tochemey yup I'm free right now. this is my linkedin https://www.linkedin.com/in/fadhilyaacob/

@sdil send me a message because it seems I need to activate premium feature before sending you a message. This is mine https://github.com/sdil/goakt-realtimemap/blob/main/main.go#L101

sdil commented 2 months ago

that's not a linkedin URL 😛

Found it from your github profile

sdil commented 2 months ago

I can't message you either.

Tochemey commented 2 months ago

@sdil these are the benchmark results on my machine:

cd bench && rm -f bench.txt && go test -run=^# -bench . -benchmem -cpu=16 -count=10 | tee bench.txt
goos: darwin
goarch: arm64
pkg: github.com/tochemey/goakt/v2/bench
cpu: Apple M1
BenchmarkActor/tell-16              3314        606899 ns/op       3419724 ops/s         308 B/op          6 allocs/op
BenchmarkActor/tell-16              3421        587953 ns/op       3464890 ops/s         303 B/op          6 allocs/op
BenchmarkActor/tell-16              3458        581630 ns/op       3568473 ops/s         302 B/op          6 allocs/op
BenchmarkActor/tell-16              3454        582282 ns/op       3370989 ops/s         302 B/op          6 allocs/op
BenchmarkActor/tell-16              3444        583975 ns/op       3359315 ops/s         312 B/op          6 allocs/op
BenchmarkActor/tell-16              3450        582916 ns/op       3659670 ops/s         301 B/op          6 allocs/op
BenchmarkActor/tell-16              3457        581736 ns/op       3532655 ops/s         302 B/op          6 allocs/op
BenchmarkActor/tell-16              3446        583620 ns/op       3503963 ops/s         302 B/op          6 allocs/op
BenchmarkActor/tell-16              3444        583934 ns/op       3551735 ops/s         302 B/op          6 allocs/op
BenchmarkActor/tell-16              3459        581408 ns/op       3568124 ops/s         302 B/op          6 allocs/op
BenchmarkActor/ask-16            1130342           966.6 ns/op     1053943 ops/s         582 B/op         10 allocs/op
BenchmarkActor/ask-16            1161424           967.1 ns/op     1052861 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-16            1164417           967.5 ns/op     1052337 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-16            1125428           967.2 ns/op     1053302 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-16            1146259           976.0 ns/op     1043344 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-16            1162370           969.2 ns/op     1050534 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-16            1158091           975.4 ns/op     1043822 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-16            1154194           981.9 ns/op     1036775 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-16            1166288           968.6 ns/op     1051095 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-16            1154174           981.5 ns/op     1037217 ops/s         576 B/op         10 allocs/op
PASS
ok      github.com/tochemey/goakt/v2/bench  194.740s

cd bench && rm -f bench.txt && go test -run=^# -bench . -benchmem -cpu=64 -count=10 | tee bench.txt
goos: darwin
goarch: arm64
pkg: github.com/tochemey/goakt/v2/bench
cpu: Apple M1
BenchmarkActor/tell-64              3298        609897 ns/op       2955308 ops/s         354 B/op          6 allocs/op
BenchmarkActor/tell-64              3402        591229 ns/op       2697057 ops/s         329 B/op          6 allocs/op
BenchmarkActor/tell-64              3412        589500 ns/op       2814118 ops/s         320 B/op          6 allocs/op
BenchmarkActor/tell-64              3404        590840 ns/op       3612948 ops/s         310 B/op          6 allocs/op
BenchmarkActor/tell-64            191126         10791 ns/op       3664785 ops/s         296 B/op          6 allocs/op
BenchmarkActor/tell-64              3415        589012 ns/op       3017118 ops/s         323 B/op          6 allocs/op
BenchmarkActor/tell-64              3357        599198 ns/op       2940115 ops/s         310 B/op          6 allocs/op
BenchmarkActor/tell-64              3432        586095 ns/op       2924168 ops/s         322 B/op          6 allocs/op
BenchmarkActor/tell-64            190279         10906 ns/op       2930726 ops/s         296 B/op          6 allocs/op
BenchmarkActor/tell-64              3414        589249 ns/op       3017012 ops/s         336 B/op          6 allocs/op
BenchmarkActor/ask-64            1076030          1024 ns/op        994416 ops/s         581 B/op         10 allocs/op
BenchmarkActor/ask-64            1098844          1011 ns/op       1007668 ops/s         577 B/op         10 allocs/op
BenchmarkActor/ask-64            1179016          1008 ns/op       1008829 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-64            1108654          1015 ns/op       1002926 ops/s         577 B/op         10 allocs/op
BenchmarkActor/ask-64            1114039          1027 ns/op        991018 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-64            1109829          1003 ns/op       1015062 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-64            1105442          1011 ns/op       1007678 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-64            1130664          1011 ns/op       1006370 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-64            1105929          1016 ns/op       1002137 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-64            1103197          1012 ns/op       1006636 ops/s         576 B/op         10 allocs/op
PASS
ok      github.com/tochemey/goakt/v2/bench  199.049s

cd bench && rm -f bench.txt && go test -run=^# -bench . -benchmem -cpu=10 -count=10 | tee bench.txt                                                                    took  3m 20s at  06:18:32 pm
goos: darwin
goarch: arm64
pkg: github.com/tochemey/goakt/v2/bench
cpu: Apple M1
BenchmarkActor/tell-10            185508         11084 ns/op       4026571 ops/s         296 B/op          6 allocs/op
BenchmarkActor/tell-10              3444        583985 ns/op       3550365 ops/s         301 B/op          6 allocs/op
BenchmarkActor/tell-10              3445        583769 ns/op       3601989 ops/s         301 B/op          6 allocs/op
BenchmarkActor/tell-10              3454        582288 ns/op       3627675 ops/s         301 B/op          6 allocs/op
BenchmarkActor/tell-10              3450        582920 ns/op       3549536 ops/s         302 B/op          6 allocs/op
BenchmarkActor/tell-10              3462        580915 ns/op       3526956 ops/s         301 B/op          6 allocs/op
BenchmarkActor/tell-10              3448        583290 ns/op       3574290 ops/s         306 B/op          6 allocs/op
BenchmarkActor/tell-10              3458        581576 ns/op       3521833 ops/s         300 B/op          6 allocs/op
BenchmarkActor/tell-10              3447        583429 ns/op       3584557 ops/s         301 B/op          6 allocs/op
BenchmarkActor/tell-10              3456        581910 ns/op       3632954 ops/s         301 B/op          6 allocs/op
BenchmarkActor/ask-10            1162461           957.4 ns/op     1063679 ops/s         578 B/op         10 allocs/op
BenchmarkActor/ask-10            1178132           968.8 ns/op     1050724 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-10            1186813           969.9 ns/op     1049383 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-10            1166163           962.7 ns/op     1057649 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-10            1164457           956.7 ns/op     1064486 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-10            1151874           969.3 ns/op     1050524 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-10            1138566           963.5 ns/op     1057205 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-10            1174144           959.1 ns/op     1061610 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-10            1148130           962.3 ns/op     1058367 ops/s         576 B/op         10 allocs/op
BenchmarkActor/ask-10            1183231           967.6 ns/op     1051966 ops/s         576 B/op         10 allocs/op
PASS
ok      github.com/tochemey/goakt/v2/bench  196.840s

Tochemey commented 2 months ago

@sdil I will spend some time and see how best to tweak the code.

Tochemey / goakt

Performance regression after updating to 2.5.0 #412