Open alphadose opened 2 years ago
Could you please try the same test as https://github.com/bytedance/gopkg/blob/develop/util/gopool/pool_test.go?
Got these results
2022/07/05 16:20:06.801184 logger.go:190: [Error] GOPOOL: panic in pool: test: test: goroutine 2025 [running]:
runtime/debug.Stack()
/opt/homebrew/Cellar/go/1.18/libexec/src/runtime/debug/stack.go:24 +0x68
github.com/bytedance/gopkg/util/gopool.(*worker).run.func1.1.1()
/Users/alphadose/assignment/go-threadpool-benchmarks/vendor/github.com/bytedance/gopkg/util/gopool/worker.go:64 +0x80
panic({0x100601700, 0x100623510})
/opt/homebrew/Cellar/go/1.18/libexec/src/runtime/panic.go:838 +0x204
command-line-arguments.testPanicFunc()
/Users/alphadose/assignment/go-threadpool-benchmarks/gopool_test.go:27 +0x30
github.com/bytedance/gopkg/util/gopool.(*worker).run.func1.1(0x1400010c2b0?, 0x10060c8a0?)
/Users/alphadose/assignment/go-threadpool-benchmarks/vendor/github.com/bytedance/gopkg/util/gopool/worker.go:69 +0x5c
github.com/bytedance/gopkg/util/gopool.(*worker).run.func1()
/Users/alphadose/assignment/go-threadpool-benchmarks/vendor/github.com/bytedance/gopkg/util/gopool/worker.go:70 +0x1a8
created by github.com/bytedance/gopkg/util/gopool.(*worker).run
/Users/alphadose/assignment/go-threadpool-benchmarks/vendor/github.com/bytedance/gopkg/util/gopool/worker.go:41 +0x64
%!(EXTRA []interface {}=[])
goos: darwin
goarch: arm64
BenchmarkPool-8 367 3841601 ns/op 179468 B/op 10227 allocs/op
BenchmarkGo-8 100 11113182 ns/op 167427 B/op 10016 allocs/op
BenchmarkItogami-8 369 3109478 ns/op 174749 B/op 10073 allocs/op
PASS
ok command-line-arguments 4.771s
https://github.com/alphadose/go-threadpool-benchmarks/blob/master/gopool_test.go
Thanks for your tests! For the first test, I think that's not the correct way to use goroutine pools, it's better to use the native goroutines instead. For the second test, seems that Itogami outperforms gopool in cpu-bound tasks. I took a quick glance at the code, and found that it makes good use of the runtime's internal funcs, that's great! I'll investigate if we could optimize the performance of gopool without the usage of runtime internal funcs.
The first test was to mimic 1000s of HTTP requests each of which typically last within milliseconds, but yes there should be some computation and variance between requests for relation to an actual use case
it's better to use the native goroutines instead.
if we remove the limit on itogami then it essentially becomes an infinite goroutine pool but it also saves a lot of memory (due to recycling via the stack) as compared to using native goroutines, hence I think its beneficial to use itogami in this case too with some modifications
I think it's not enough to test cpu cost handler only, in most of the real scenarios, blocking is the much more usual operation than pure cpu cost, eg: http handler, rpc handler, and there is a write syscall at least.
I tried with a short sleep and get a much worse performance from gopool than std:
func testFunc() {
// DoCopyStack(0, 0)
time.Sleep(time.Microsecond * 10)
}
result:
BenchmarkPool-8 1 13349818686 ns/op 727840 B/op 20125 allocs/op
BenchmarkGo-8 82 13735426 ns/op 968846 B/op 20042 allocs/op
This much worse result may be because the default pool size is only 8 in my env(runtime.GOMAXPROCS(0)==8). I change the pool size to 1000 or 10000 and gopool runs much better but still worse than std:
# pool size: 1000
BenchmarkPool-8 10 109975368 ns/op 466008 B/op 12162 allocs/op
BenchmarkGo-8 85 13984710 ns/op 970441 B/op 20082 allocs/op
# pool size: 10000
BenchmarkPool-8 68 17622188 ns/op 1421435 B/op 30038 allocs/op
BenchmarkGo-8 86 14227710 ns/op 971272 B/op 20074 allocs/op
it also saves a lot of memory
@alphadose
I think the memory cost for each Go
operation can be ignored since it runs 10000 times in each b.N loop, it's a much smaller memory cost for each Go
operation:
https://github.com/bytedance/gopkg/blob/develop/util/gopool/pool_test.go#L24
But indeed we need to limit the pool size to control the total memory cost of the process.
I took a quick glance at the code, and found that it makes good use of the runtime's internal funcs, that's great!
@alphadose Cool!
@alphadose I took a look at your test and found that it's unfair because itogami's pool size is 5e4 but gopool size is much smaller. For itogami's pool it's similar to using std goroutine but your implementation performs a little better. https://github.com/alphadose/go-threadpool-benchmarks/blob/master/gopool_test.go#L89
And the same point with gopool's default cpu cost handler, if I limit the pool size and change to a testFunc with a short sleep, itogami doesn't perform better than std in my env:
# pool size: 1000
BenchmarkPool-8 10 110218422 ns/op 461439 B/op 12138 allocs/op
BenchmarkGo-8 84 13746197 ns/op 969096 B/op 20052 allocs/op
BenchmarkItogami-8 10 104207091 ns/op 200136 B/op 10413 allocs/op
It's only faster than std when the PoolSize >= benchmarkTimes or the blocking time is shorter than b.N calling loop cost, else it's worse than std.
if we remove the limit on itogami then it essentially becomes an infinite goroutine pool but it also saves a lot of memory (due to recycling via the stack) as compared to using native goroutines, hence I think its beneficial to use itogami in this case too with some modifications
It's no use if we don't limit the pool size and I think users prefer to use std goroutine rather than use another implementation like std.
@alphadose
I met a SIGSEGV
from Itogami when running the benchmark:
goos: linux
goarch: amd64
pkg: github.com/alphadose/go-threadpool-benchmarks
cpu: Intel(R) Core(TM) i7-8700 CPU @ 3.20GHz
BenchmarkGo-8 328 3554240 ns/op 972669 B/op 20124 allocs/op
BenchmarkPool-8 206 6353972 ns/op 590759 B/op 19165 allocs/op
BenchmarkItogami-8 fatal error: unexpected signal during runtime execution
[signal SIGSEGV: segmentation violation code=0x1 addr=0x330 pc=0x4372a7]
......
@lesismal I made itogami as a proof of concept for a better golang scheduler, and as it uses golang runtime internals it is unsafe because of some edge cases here and there. Hence, its also unsafe to use for real use cases.
If you are interested here is the corresponding issue in golang core https://github.com/golang/go/issues/53398 ^ the benchmarking in the above case is based on a no-limit version of itogami which can be viewed in this branch https://github.com/alphadose/itogami/tree/golang-scheduler
If you are interested here is the corresponding issue in golang core golang/go#53398 ^ the benchmarking in the above case is based on a no-limit version of itogami which can be viewed in this branch https://github.com/alphadose/itogami/tree/golang-scheduler
I took a quick look at 53398, It's good trying to use lock-free to optimize performance. But as I know, lock-free is only suitable for a few scenarios such as a queue's push and pop operations. It can't be used to optimize a process with multiple steps which have beyond the atomic operations' ability.
@lesismal I made itogami as a proof of concept for a better golang scheduler, and as it uses golang runtime internals it is unsafe because of some edge cases here and there. Hence, its also unsafe to use for real use cases.
Since it's unsafe, the benchmark result should not be used to compare with std other goroutine pools, it doesn't make sense and would make misleading.
About http performance, I agree with you that fasthttp is really fast, but gnet's http benchmark is not a fully implementation of http protocol, details here, that also makes misleading. I used to opend an issue to advise not to use that benchmark result to claim gnet's performance but wasn't accepted.. But I would still be glad to see if someday the author deletes the http benchmark result, or if he fully supports the protocol.
I have another lib that does these tls/http/websocket things with goroutine num and memory usage under controlled: https://github.com/lesismal/nbio There's also a sub-package that implements goroutine pools with limited size, it can't perform better than std but can limit pool size, and it seems the benchmark with short sleep performs a little better than I attached in the previous comments, if interested, welcome to try.
@lesismal
But as I know, lock-free is only suitable for a few scenarios such as a queue's push and pop operations. It can't be used to optimize a process with multiple steps which have beyond the atomic operations' ability.
I went through the golang's scheduling code and there is room for improvement by switching to a lock-free stack in some places. I got the idea itself from a gophercon talk delivered by a member of golang core team and I tested it out, got some good results. The major performance improvement by switching to a lock free stack would be scheduling all operations entirely in user-space and avoid lock()/unlock()
system call context switches.
Since it's unsafe, the benchmark result should not be used to compare with std other goroutine pools, it doesn't make sense and would make misleading.
Agreed, I will update the readme so that people are aware that its experimental and unsafe.
Question
GoPool seems to perform poorly from the benchmarks I ran
Reference -> https://github.com/alphadose/itogami
Benchmarking code -> https://github.com/alphadose/go-threadpool-benchmarks