Closed sdil closed 1 month ago
@sdil Thanks for the bug. I will take a look in the course of the week.
@sdil Do I need additional tools before running your sample?
I was using https://github.com/rakyll/hey to run load test but you can technically use any tools to do the same.
I'll update the reproduction step in the description
@Tochemey I've updated the step on how reproduce the bug
@Tochemey I've updated the step on how reproduce the bug
When I run go run .
as stated I got the following log. Is it something that is expected?
cc @sdil
Yes that is expected. The program listen mqtt messages, creates actor and update its state based on the messages
@sdil I have been able to reproduce it. I will take a look at what may be the issue in the weekend. Thanks
@sdil I have added some small test to check actors message handling throughput as shown here:
=== RUN TestBenchmark_BenchTell
starting benchmark for (30s): num workers:(50)
total actors spawned: (2000)
total workers: (50), total messages sent: (92645866), total messages received: (92645866) - duration: (30s)
messages per second: (3088195)
--- PASS: TestBenchmark_BenchTell (52.64s)
PASS
=== RUN TestBenchmark_BenchAsk
starting benchmark for (30s): num workers:(50)
total actors spawned: (2000)
total workers: (50), total messages sent: (7489596), total messages received: (7489596) - duration: (30s)
messages per second: (249653)
--- PASS: TestBenchmark_BenchAsk (52.63s)
PASS
The test can be found here https://github.com/Tochemey/goakt/blob/issue-perf/bench/benchmark_test.go#L165
As you can see Ask
is not really performant compare to Tell
I will enhance this benchmark test with additional test cases. I am sure depending upon machine specs and other factors the result may vary.
Ahh the Tell operation is fantastic. I didn't realize that Ask is 1/10 of the throughput of Tell. That's a really great benchmark test.
Just curious, what is your machine spec?
Ahh the Tell operation is fantastic. I didn't realize that Ask is 1/10 of the throughput of Tell. That's a really great benchmark test.
Just curious, what is your machine spec?
13-inch M1 2020, 16GB mem, macOS
@sdil Apologies for the delay in figuring out what may be causing the performance drop. I welcome any idea to enhance performance and the library in general.
Did you consider MPSC queue as suggested by @halturin? It seems like the same queue implemented in Akka
Ref: https://concurrencyfreaks.blogspot.com/2014/04/multi-producer-single-consumer-queue.html
Did you consider MPSC queue as suggested by @halturin? It seems like the same queue implemented in Akka
Ref: https://concurrencyfreaks.blogspot.com/2014/04/multi-producer-single-consumer-queue.html
Thanks. I am going to give it a try :)
@sdil using the mpsc queue seems increase some throughput:
=== RUN TestBenchmark_BenchTell
starting benchmark for (30s): num workers:(50)
total actors spawned: (2000)
total workers: (50), total messages sent: (105993035), total messages received: (105993035) - duration: (30s)
messages per second: (3533101)
--- PASS: TestBenchmark_BenchTell (52.73s)
PASS
@Tochemey in ergo, mpsc gives me 6M msg/sec between two local actors https://github.com/ergo-services/benchmarks (this perf depends on the HW for sure)
@Tochemey in ergo, mpsc gives me 6M msg/sec between two local actors https://github.com/ergo-services/benchmarks (this perf depends on the HW for sure)
my benchmarks are only for sending messages to an actor not even actor to actor :). Pretty sure something is wrong in my implementation.
@Tochemey i believe you can reach the same numbers easily
@Tochemey i believe you can reach the same numbers easily
I hope so :)
Also let me take a look at ergo, maybe I may abandon Go-Akt and redirect people to ergo as long as it does not require licensing :) because that is the goal of Go-Akt. cc @halturin
see v300 then :) its about to release. preparing the docs now. planned for coming September.
https://github.com/ergo-services/examples/tree/v300 - examples. see "demo" there (it has "observer" app embedded, open http://localhost:9911 to see the node/network/processes etc)
@halturin I appreciate the support to help my library to be better. I really do. I will be glad to contribute to ergo. I have few questions to ask:
What are the current cluster provider engine integration?(k8, NATS, etc...)
its in the roadmap now.
didnt get the first two.
maybe I may abandon Go-Akt and redirect people to ergo as long as it does not require licensing :) because that is the goal of Go-Akt.
please, don't. we might have different vision of "where to go/how to do".
Personally, I liked go-akt syntaxes because it feels closer to MS Orleans. I really liked that I can do scheduling, passivation, etc. I was abit struggled with proto messaging, but that's because I never used it. Now I understand the type checking it provides.
Ergo looks nice too but I haven't give it a try yet. Ergo looks closer to Erlang & OTP but I struggled with Elixir in the past because of its syntax and paradigm.
@sdil I am trying to see if I can match the performance of ergo :)
@sdil I have released 2.5.1 to include the mspc queue and few fixes. At the moment, Go-Akt is not yet close to ergo. Let us leave the issue open while I dig more in some the implementation and see where we can gain some more performance.
I noticed you used generics for your mpsc implementation. I believe, I tried this way and moved back to the 'any" type due to performance issues. But, honestly I had no enough time to dig into it.
Thank you for the release @Tochemey great work! I'm pretty sure we can find some other optimizations in the future π
Please let us leave the issue open.
I noticed you used generics for your mpsc implementation. I believe, I tried this way and moved back to the 'any" type due to performance issues. But, honestly I had no enough time to dig into it.
Interesting. Let me take your advice and see the outcome. Thanks for pointing out. @halturin I found this article https://planetscale.com/blog/generics-can-make-your-go-code-slower
@halturin I will be glad to have a chat with you to help fix some of the bottleneck that is hindering performance.
@Tochemey i was following a series of your development in these PRs
I remember i saw a huge performance gain from these PRs when I tested locally. just curious, why are they not merged?
@sdil I closed those PRs because some of the changes have been added into other PRs which has been merged. However, I have not fully tested the new changes I made. :)
@sdil you can reference the latest main commit and see. At the moment I am doing some cleaning and refactoring on the code base and checking where I can gain some performance.
@sdil I have released a new version https://github.com/Tochemey/goakt/releases/tag/v2.6.0
Ahh i saw that this morning. I'll give it a try later
@Tochemey I'm still seeing bad load test result here
fyaacob@M1622 ~> hey -c 50 "http://localhost:8080/vehicle?id=0022.00945" -z 30s
Summary:
Success rate: 1.0000
Total: 30.0016 secs
Slowest: 1.7371 secs
Fastest: 0.0005 secs
Average: 0.4966 secs
Requests/sec: 100.0948
Total data: 1.62 MiB
Size/request: 566 B
Size/sec: 55.34 KiB
Plus the CPU is taking 900% now.
This is commit for updating the library https://github.com/sdil/goakt-realtimemap/commit/84143d7fd8168497a1753467306362573d6c5a29
Let me know if messed up with anything π
@sdil I will have through your implementation and see what is going on.
@sdil I will have through your implementation and see what is going on.
This bench code here on my machine returns this output:
=== RUN TestBenchmark_BenchAsk
Starting benchmark for (30s): num workers:(100)
Total workers: (100), total messages sent: (29025818), total messages received: (29025818) - duration: (30s)
Messages per second: (967527)
--- PASS: TestBenchmark_BenchAsk (32.53s)
PASS
It spawns 100 workers to send Ask to a given actor.
and this bench code here on my machine returns this output:
goos: darwin
goarch: arm64
pkg: github.com/tochemey/goakt/v2/bench
cpu: Apple M1
BenchmarkActor
BenchmarkActor/ask
BenchmarkActor/ask-8 1099460 984.2 ns/op 1035262 ops/s 577 B/op 10 allocs/op
PASS
@sdil I hope these give you some idea. I will look into your implementation when I get some spare time.
@sdil at first glance I can see some loop here constantly querying for actors. I don't know what you want to achieve though but this can be an issue.
I'm running the bench test on my machine and getting this
> make bench
cd bench && rm -f bench.txt && go test -run=^# -bench . -benchmem -count=30 -timeout=0 | tee bench.txt
goos: darwin
goarch: arm64
pkg: github.com/tochemey/goakt/v2/bench
cpu: Apple M1 Pro
BenchmarkActor/tell-10 3304 608753 ns/op 3865833 ops/s 312 B/op 6 allocs/op
BenchmarkActor/tell-10 3435 585545 ns/op 3740811 ops/s 301 B/op 6 allocs/op
BenchmarkActor/tell-10 3436 585312 ns/op 4126291 ops/s 308 B/op 6 allocs/op
BenchmarkActor/tell-10 3439 584775 ns/op 4174597 ops/s 301 B/op 6 allocs/op
I think something is wrong with my machine. I restarted my machine and it's still showing the same result
@sdil I will be glad to sync with you if you are free. :). I can send you an invite on linkedin
@Tochemey yup I'm free right now. this is my linkedin https://www.linkedin.com/in/fadhilyaacob/
@Tochemey yup I'm free right now. this is my linkedin https://www.linkedin.com/in/fadhilyaacob/
@sdil send me a message because it seems I need to activate premium feature before sending you a message. This is mine https://github.com/sdil/goakt-realtimemap/blob/main/main.go#L101
that's not a linkedin URL π
Found it from your github profile
I can't message you either.
@sdil these are the benchmark results on my machine:
cd bench && rm -f bench.txt && go test -run=^# -bench . -benchmem -cpu=16 -count=10 | tee bench.txt
goos: darwin
goarch: arm64
pkg: github.com/tochemey/goakt/v2/bench
cpu: Apple M1
BenchmarkActor/tell-16 3314 606899 ns/op 3419724 ops/s 308 B/op 6 allocs/op
BenchmarkActor/tell-16 3421 587953 ns/op 3464890 ops/s 303 B/op 6 allocs/op
BenchmarkActor/tell-16 3458 581630 ns/op 3568473 ops/s 302 B/op 6 allocs/op
BenchmarkActor/tell-16 3454 582282 ns/op 3370989 ops/s 302 B/op 6 allocs/op
BenchmarkActor/tell-16 3444 583975 ns/op 3359315 ops/s 312 B/op 6 allocs/op
BenchmarkActor/tell-16 3450 582916 ns/op 3659670 ops/s 301 B/op 6 allocs/op
BenchmarkActor/tell-16 3457 581736 ns/op 3532655 ops/s 302 B/op 6 allocs/op
BenchmarkActor/tell-16 3446 583620 ns/op 3503963 ops/s 302 B/op 6 allocs/op
BenchmarkActor/tell-16 3444 583934 ns/op 3551735 ops/s 302 B/op 6 allocs/op
BenchmarkActor/tell-16 3459 581408 ns/op 3568124 ops/s 302 B/op 6 allocs/op
BenchmarkActor/ask-16 1130342 966.6 ns/op 1053943 ops/s 582 B/op 10 allocs/op
BenchmarkActor/ask-16 1161424 967.1 ns/op 1052861 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-16 1164417 967.5 ns/op 1052337 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-16 1125428 967.2 ns/op 1053302 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-16 1146259 976.0 ns/op 1043344 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-16 1162370 969.2 ns/op 1050534 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-16 1158091 975.4 ns/op 1043822 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-16 1154194 981.9 ns/op 1036775 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-16 1166288 968.6 ns/op 1051095 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-16 1154174 981.5 ns/op 1037217 ops/s 576 B/op 10 allocs/op
PASS
ok github.com/tochemey/goakt/v2/bench 194.740s
cd bench && rm -f bench.txt && go test -run=^# -bench . -benchmem -cpu=64 -count=10 | tee bench.txt
goos: darwin
goarch: arm64
pkg: github.com/tochemey/goakt/v2/bench
cpu: Apple M1
BenchmarkActor/tell-64 3298 609897 ns/op 2955308 ops/s 354 B/op 6 allocs/op
BenchmarkActor/tell-64 3402 591229 ns/op 2697057 ops/s 329 B/op 6 allocs/op
BenchmarkActor/tell-64 3412 589500 ns/op 2814118 ops/s 320 B/op 6 allocs/op
BenchmarkActor/tell-64 3404 590840 ns/op 3612948 ops/s 310 B/op 6 allocs/op
BenchmarkActor/tell-64 191126 10791 ns/op 3664785 ops/s 296 B/op 6 allocs/op
BenchmarkActor/tell-64 3415 589012 ns/op 3017118 ops/s 323 B/op 6 allocs/op
BenchmarkActor/tell-64 3357 599198 ns/op 2940115 ops/s 310 B/op 6 allocs/op
BenchmarkActor/tell-64 3432 586095 ns/op 2924168 ops/s 322 B/op 6 allocs/op
BenchmarkActor/tell-64 190279 10906 ns/op 2930726 ops/s 296 B/op 6 allocs/op
BenchmarkActor/tell-64 3414 589249 ns/op 3017012 ops/s 336 B/op 6 allocs/op
BenchmarkActor/ask-64 1076030 1024 ns/op 994416 ops/s 581 B/op 10 allocs/op
BenchmarkActor/ask-64 1098844 1011 ns/op 1007668 ops/s 577 B/op 10 allocs/op
BenchmarkActor/ask-64 1179016 1008 ns/op 1008829 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-64 1108654 1015 ns/op 1002926 ops/s 577 B/op 10 allocs/op
BenchmarkActor/ask-64 1114039 1027 ns/op 991018 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-64 1109829 1003 ns/op 1015062 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-64 1105442 1011 ns/op 1007678 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-64 1130664 1011 ns/op 1006370 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-64 1105929 1016 ns/op 1002137 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-64 1103197 1012 ns/op 1006636 ops/s 576 B/op 10 allocs/op
PASS
ok github.com/tochemey/goakt/v2/bench 199.049s
cd bench && rm -f bench.txt && go test -run=^# -bench . -benchmem -cpu=10 -count=10 | tee bench.txt took ο 3m 20s at ο 06:18:32 pm
goos: darwin
goarch: arm64
pkg: github.com/tochemey/goakt/v2/bench
cpu: Apple M1
BenchmarkActor/tell-10 185508 11084 ns/op 4026571 ops/s 296 B/op 6 allocs/op
BenchmarkActor/tell-10 3444 583985 ns/op 3550365 ops/s 301 B/op 6 allocs/op
BenchmarkActor/tell-10 3445 583769 ns/op 3601989 ops/s 301 B/op 6 allocs/op
BenchmarkActor/tell-10 3454 582288 ns/op 3627675 ops/s 301 B/op 6 allocs/op
BenchmarkActor/tell-10 3450 582920 ns/op 3549536 ops/s 302 B/op 6 allocs/op
BenchmarkActor/tell-10 3462 580915 ns/op 3526956 ops/s 301 B/op 6 allocs/op
BenchmarkActor/tell-10 3448 583290 ns/op 3574290 ops/s 306 B/op 6 allocs/op
BenchmarkActor/tell-10 3458 581576 ns/op 3521833 ops/s 300 B/op 6 allocs/op
BenchmarkActor/tell-10 3447 583429 ns/op 3584557 ops/s 301 B/op 6 allocs/op
BenchmarkActor/tell-10 3456 581910 ns/op 3632954 ops/s 301 B/op 6 allocs/op
BenchmarkActor/ask-10 1162461 957.4 ns/op 1063679 ops/s 578 B/op 10 allocs/op
BenchmarkActor/ask-10 1178132 968.8 ns/op 1050724 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-10 1186813 969.9 ns/op 1049383 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-10 1166163 962.7 ns/op 1057649 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-10 1164457 956.7 ns/op 1064486 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-10 1151874 969.3 ns/op 1050524 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-10 1138566 963.5 ns/op 1057205 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-10 1174144 959.1 ns/op 1061610 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-10 1148130 962.3 ns/op 1058367 ops/s 576 B/op 10 allocs/op
BenchmarkActor/ask-10 1183231 967.6 ns/op 1051966 ops/s 576 B/op 10 allocs/op
PASS
ok github.com/tochemey/goakt/v2/bench 196.840s
@sdil I will spend some time and see how best to tweak the code.
Bug description
I'm building a demo web app using goakt. It spawns over 600 vehicle actors during its runtime.
With goakt version 2.4.0, I can get over 35k rps when running load test for the request to get the vehicle position. The memory usage is over 17GB.
After updating to version 2.5.0, the number of rps dropped to only 3.5k. The memory usage is only <40MB (which is superb).
How to reproduce it?
go run .
on first terminal/vehicle?id=<actor name>
endpoint. I'm using https://github.com/rakyll/hey so the full command looks like this:hey -c 50 "http://localhost:8080/vehicle?id=<actor name>" -z 30s
. You can get the actor name from the log in the terminalChange the goakt version to 2.4.0 and repeat the same process.
Expected behavior
The performance to remain around the same.
Screenshots
N/A
Library Version:
Additional context
When running with 1 actor, the performance is kinda similar. But when I spawn 10 actors, it started degrading significantly.