Closed dhui closed 6 years ago
It appears that reflected function calls are slow. See: https://github.com/golang/go/issues/7818
Need to investigate alternatives to reflected function calls.
$ go test -cpuprofile cpu.prof -memprofile mem.prof -bench . -benchmem
$ go tool pprof -top -cum cpu.prof Type: cpu Time: Sep 11, 2018 at 4:18pm (PDT) Duration: 1.32mins, Total samples = 285.11s (361.04%) Showing nodes accounting for 253.95s, 89.07% of 285.11s total Dropped 410 nodes (cum <= 1.43s) flat flat% sum% cum cum% 164.30s 57.63% 57.63% 164.31s 57.63% runtime.usleep 0.51s 0.18% 57.81% 162.27s 56.91% runtime.lock 0.02s 0.007% 57.81% 160.68s 56.36% runtime.osyield 0.10s 0.035% 57.85% 94.14s 33.02% runtime.morestack 1.26s 0.44% 58.29% 94.04s 32.98% runtime.newstack 0.23s 0.081% 58.37% 88.93s 31.19% runtime.copystack 0.10s 0.035% 58.41% 88.50s 31.04% runtime.mcall 0.14s 0.049% 58.45% 77.98s 27.35% runtime.goexit0 0.31s 0.11% 58.56% 55.08s 19.32% runtime.stackfree 0.19s 0.067% 58.63% 54.77s 19.21% runtime.stackcacherelease 0.32s 0.11% 58.74% 47.48s 16.65% reflect.Value.Call 3.27s 1.15% 59.89% 47.02s 16.49% reflect.Value.call 0.63s 0.22% 60.11% 46.06s 16.16% runtime.schedule 0.73s 0.26% 60.37% 42.99s 15.08% github.com/dhui/thevent.(*Event).dispatch.func2 0.59s 0.21% 60.57% 42.64s 14.96% runtime.gfput 0 0% 60.57% 34.59s 12.13% testing.(*B).runN 0 0% 60.57% 34.55s 12.12% testing.(*B).launch 0.28s 0.098% 60.67% 34.35s 12.05% github.com/dhui/thevent.(*Event).dispatch 0.17s 0.06% 60.73% 32.13s 11.27% runtime.stackalloc 0.07s 0.025% 60.76% 31.94s 11.20% runtime.stackcacherefill 4.28s 1.50% 62.26% 29.30s 10.28% runtime.gentraceback 0.13s 0.046% 62.30% 27.47s 9.63% runtime.systemstack 5.71s 2.00% 64.31% 22.91s 8.04% reflect.implements 0.08s 0.028% 64.33% 21.90s 7.68% runtime.chansend1 0.09s 0.032% 64.36% 21.82s 7.65% runtime.chansend 0 0% 64.36% 20.17s 7.07% runtime.mstart 0.01s 0.0035% 64.37% 19.07s 6.69% runtime.newproc.func1 0.29s 0.1% 64.47% 19.06s 6.69% runtime.newproc1 0.04s 0.014% 64.48% 17.94s 6.29% runtime.findrunnable 0.03s 0.011% 64.49% 15.51s 5.44% github.com/dhui/thevent_test.BenchmarkEvents.func1.1 0.01s 0.0035% 64.50% 15.47s 5.43% github.com/dhui/thevent.(*Event).Dispatch 1.59s 0.56% 65.06% 15.47s 5.43% runtime.adjustframe 0.02s 0.007% 65.06% 15.05s 5.28% github.com/dhui/thevent_test.BenchmarkEvents.func1.2 0.02s 0.007% 65.07% 15.02s 5.27% github.com/dhui/thevent.(*Event).DispatchWithResults 7.74s 2.71% 67.78% 14.87s 5.22% runtime.pcvalue 0.38s 0.13% 67.92% 14.77s 5.18% reflect.(*rtype).AssignableTo 0.52s 0.18% 68.10% 14.50s 5.09% reflect.Value.assignTo 0 0% 68.10% 14.24s 4.99% runtime.semawakeup 0.38s 0.13% 68.23% 13.82s 4.85% runtime.unlock 0.63s 0.22% 68.45% 12.50s 4.38% runtime.gfget 2.18s 0.76% 69.22% 11.76s 4.12% runtime.getStackMap 0.01s 0.0035% 69.22% 10.18s 3.57% runtime.gfget.func1 0.05s 0.018% 69.24% 10.05s 3.52% runtime.park_m 9.51s 3.34% 72.58% 9.51s 3.34% runtime.pthread_cond_signal 1.61s 0.56% 73.14% 8.74s 3.07% reflect.(*rtype).typeOff 0.43s 0.15% 73.29% 8.50s 2.98% runtime.pcdatavalue 0.62s 0.22% 73.51% 7.44s 2.61% runtime.funcspdelta 1.48s 0.52% 74.03% 7.13s 2.50% reflect.resolveTypeOff 0 0% 74.03% 7.04s 2.47% runtime.semasleep 1.46s 0.51% 74.54% 6.83s 2.40% runtime.mallocgc 2.53s 0.89% 75.43% 6.58s 2.31% reflect.(*rtype).nameOff 5.44s 1.91% 77.34% 6.11s 2.14% runtime.step 5.78s 2.03% 79.36% 5.78s 2.03% runtime.pthread_cond_wait 0.30s 0.11% 79.47% 5.74s 2.01% runtime.runqput 1.34s 0.47% 79.94% 5.65s 1.98% runtime.(*_type).typeOff 0.10s 0.035% 79.97% 5.32s 1.87% runtime.runqputslow 0 0% 79.97% 5.17s 1.81% runtime.stopm 0 0% 79.97% 4.91s 1.72% runtime.notesleep 4.54s 1.59% 81.57% 4.54s 1.59% runtime.pthread_mutex_lock 3.45s 1.21% 82.78% 4.31s 1.51% runtime.resolveTypeOff 0.12s 0.042% 82.82% 4.08s 1.43% reflect.unsafe_New 1.90s 0.67% 83.48% 4.05s 1.42% reflect.resolveNameOff 0.43s 0.15% 83.63% 3.99s 1.40% reflect.directlyAssignable 0 0% 83.63% 3.94s 1.38% runtime.goschedImpl 0.01s 0.0035% 83.64% 3.58s 1.26% runtime.runqsteal 0 0% 83.64% 3.57s 1.25% runtime.gopreempt_m 0.02s 0.007% 83.64% 3.57s 1.25% runtime.runqgrab 1.53s 0.54% 84.18% 3.53s 1.24% reflect.(*rtype).Name 0.09s 0.032% 84.21% 3.40s 1.19% runtime.newproc 0 0% 84.21% 3.30s 1.16% runtime.gcstopm 0.42s 0.15% 84.36% 3.02s 1.06% reflect.funcLayout 0.28s 0.098% 84.46% 2.75s 0.96% runtime.makeslice 0.47s 0.16% 84.62% 2.58s 0.9% sync.(*Map).Load 0.02s 0.007% 84.63% 2.48s 0.87% github.com/dhui/thevent_test.BenchmarkEvents.func1.3 0 0% 84.63% 2.45s 0.86% github.com/dhui/thevent.(*Event).DispatchAsync 1.89s 0.66% 85.29% 2.22s 0.78% runtime.findfunc 2.15s 0.75% 86.05% 2.15s 0.75% runtime.resolveNameOff 0.08s 0.028% 86.08% 2.12s 0.74% github.com/dhui/thevent_test.BenchmarkEvents.func1.4.1 2.01s 0.7% 86.78% 2.11s 0.74% runtime.adjustpointers 0.06s 0.021% 86.80% 2.04s 0.72% runtime.chanrecv 0 0% 86.80% 2.04s 0.72% runtime.chanrecv2 0 0% 86.80% 2.04s 0.72% runtime.gcBgMarkWorker.func2 0.21s 0.074% 86.88% 2.04s 0.72% runtime.gcDrain 0.94s 0.33% 87.20% 2s 0.7% reflect.(*rtype).String 0.72s 0.25% 87.46% 2s 0.7% runtime.mapaccess2 0 0% 87.46% 1.95s 0.68% runtime.gcBgMarkWorker 1.10s 0.39% 87.84% 1.62s 0.57% runtime.heapBitsSetType 0.38s 0.13% 87.98% 1.57s 0.55% runtime.call64 1.56s 0.55% 88.52% 1.56s 0.55% reflect.name.name (inline) 1.54s 0.54% 89.06% 1.54s 0.54% runtime.memmove 0.01s 0.0035% 89.07% 1.52s 0.53% github.com/dhui/thevent_test.BenchmarkEvents.func1.4 0 0% 89.07% 1.44s 0.51% github.com/dhui/thevent.(*Event).DispatchAsyncWithResults 0.01s 0.0035% 89.07% 1.44s 0.51% runtime.gcAssistAlloc
$ go tool pprof -top -cum mem.prof Type: alloc_space Time: Sep 11, 2018 at 4:19pm (PDT) Showing nodes accounting for 7010.28MB, 99.57% of 7040.50MB total Dropped 29 nodes (cum <= 35.20MB) flat flat% sum% cum cum% 0 0% 0% 6422.26MB 91.22% reflect.Value.Call 6421.25MB 91.20% 91.20% 6422.26MB 91.22% reflect.Value.call 0 0% 91.20% 3901.35MB 55.41% testing.(*B).runN 0 0% 91.20% 3894.65MB 55.32% testing.(*B).launch 392.02MB 5.57% 96.77% 3856.15MB 54.77% github.com/dhui/thevent.(*Event).dispatch 30.50MB 0.43% 97.21% 3096.12MB 43.98% github.com/dhui/thevent.(*Event).dispatch.func2 7.50MB 0.11% 97.31% 1841.07MB 26.15% github.com/dhui/thevent_test.BenchmarkEvents.func1.1 0 0% 97.31% 1833.57MB 26.04% github.com/dhui/thevent.(*Event).Dispatch 8MB 0.11% 97.43% 1738.57MB 24.69% github.com/dhui/thevent_test.BenchmarkEvents.func1.2 0 0% 97.43% 1730.57MB 24.58% github.com/dhui/thevent.(*Event).DispatchWithResults 12.50MB 0.18% 97.60% 198.01MB 2.81% github.com/dhui/thevent_test.BenchmarkEvents.func1.4 0 0% 97.60% 185.51MB 2.63% github.com/dhui/thevent.(*Event).DispatchAsyncWithResults 10.50MB 0.15% 97.75% 117MB 1.66% github.com/dhui/thevent_test.BenchmarkEvents.func1.3 107.50MB 1.53% 99.28% 107.50MB 1.53% reflect.ValueOf (inline) 0 0% 99.28% 106.50MB 1.51% github.com/dhui/thevent.(*Event).DispatchAsync 0 0% 99.28% 36.24MB 0.51% runtime.mstart 20.50MB 0.29% 99.57% 36.24MB 0.51% runtime.systemstack
Whoops, opened issue in wrong repo... The issue actually belongs to https://github.com/dhui/thevent
It appears that reflected function calls are slow. See: https://github.com/golang/go/issues/7818
Need to investigate alternatives to reflected function calls.
Supporting data
Collected profile from benchmarks
CPU Profile
Memory profile