apache / apisix

The Cloud-Native API Gateway
https://apisix.apache.org/blog/
Apache License 2.0
14.5k stars 2.52k forks source link

help request: Can the performance of grpc-transcode be improved? Or analysis? #7671

Closed Bigwen-1 closed 1 year ago

Bigwen-1 commented 2 years ago

Description

hi, We expect the production environment to use grpc-transcode plugin for pressure measurement. Compared with the original GRPC client directly requesting GRPC Server QPS, grpc-transcode decreases by more than 5 times. Can it be improved? Or can I do some link tracing analysis on Apisix grpc-transcode?

Pressure test conclusion:

  1. The QPS of grpc-transcode is around 600, while the QPS of native GRPC Client is more than 3200, which is more than 5 times the difference.
  2. When grpc-transcode is 400 concurrent, the response time P99 is more than 1 second. When native grpc is 500 concurrent, the response time P99 is less than 400ms.

Environment

tzssangglass commented 2 years ago

IMO, grpc-transcode involves full protocol conversion and coding and decoding of the body, which can degrade performance. cc @spacewander @tokers Do you guys have some ideas?

I have a question: using grpc-transcode means the request path is: HTTP(s) -> APISIX -> gRPC server, different from GRPC client directly requesting GRPC Server QPS:

tokers commented 2 years ago

First of all, protocol conversion will take some time which is not avoidable. There is a trade-off.

What we can do is enhance the implementation of gRPC Transcode and make it faster. I don't have any further insights about it, we may inspect it via a flame graph and see if we can optimize the hot code path. Or even, change the way we implement gRPC Transcode (e.g, using FFI).

Bigwen-1 commented 2 years ago

I tried to analyze the transcoding performance with the flame graph, but perf always failed to obtain stack index. Open SVG display: ERROR: No Valid Input provided to flamegraph.pl. Can you give me a hand?

I operate as follows: root@xxali-ecs-xxx-xx-xx-xx:/data/server/FlameGraph/FlameGraph# ls aix-perf.pl files.pl record-test.sh stackcollapse-jstack.pl stackcollapse-vsprof.pl demos flamegraph.pl stackcollapse-aix.pl stackcollapse-ljp.awk stackcollapse-vtune.pl dev jmaps stackcollapse-bpftrace.pl stackcollapse-perf.pl stackcollapse-wcp.pl difffolded.pl perf.data stackcollapse-chrome-tracing.py stackcollapse-perf-sched.awk stackcollapse-xdebug.php docs perf.data.old stackcollapse-elfutils.pl stackcollapse.pl test example-dtrace-stacks.txt pkgsplit-perf.pl stackcollapse-gdb.pl stackcollapse-pmc.pl test.sh example-dtrace.svg pmCount.svg stackcollapse-go.pl stackcollapse-recursive.pl example-perf-stacks.txt.gz range-perf.pl stackcollapse-instruments.pl stackcollapse-sample.awk example-perf.svg README.md stackcollapse-java-exceptions.pl stackcollapse-stap.pl root@xxali-ecs-xxx-xx-xx-xx:/data/server/FlameGraph/FlameGraph# perf record -F 99 -a -g -p 10853 -- sleep 30 Warning: PID/TID switch overriding SYSTEM[ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.012 MB perf.data ] root@xxali-ecs-xxx-xx-xx-xx:/data/server/FlameGraph/FlameGraph# perf script > out.perf root@xxali-ecs-xxx-xx-xx-xx:/data/server/FlameGraph/FlameGraph# ./stackcollapse-perf.pl out.perf > out.folded root@xxali-ecs-xxx-xx-xx-xx:/data/server/FlameGraph/FlameGraph# ./flamegraph.pl out.folded > pmCount.svg ERROR: No stack counts found

tzssangglass commented 2 years ago

Here is the process for generating a flame map in CI: https://github.com/apache/apisix/blob/master/.github/workflows/performance.yml

Here is the corresponding script: https://github.com/apache/apisix/blob/master/ci/performance_test.sh

You can check against the script to see if you've missed some steps.

tzssangglass commented 2 years ago

ERROR: No stack counts found

It's likely that your perf workers are not the ones executing the request, and you can limit the number of workers to 1.

change https://github.com/apache/apisix/blob/cffa4b69f9080707d100fbe856f251054cc8cba4/conf/config-default.yaml#L165 is config.yaml

Bigwen-1 commented 2 years ago

Hi, I'm sorry to bother you again. I'm really sorry that the flame map is still not drawn. https://github.com/apache/apisix/blob/master/.github/workflows/performance.yml appears to be every time a commit a jobs, because do not know how to put the whole yml, So I split each step to execute it.

A total of 4 steps have been implemented, which are

    • name: Install Dependencies run: sudo ./ci/performance_test.sh install_dependencies
    • name: Install wrk2 run: sudo ./ci/performance_test.sh install_wrk2
    • name: Install SystemTap Tools run: sudo ./ci/performance_test.sh install_stap_tools
    • name: Perf Test run: ./ci/performance_test.sh run_performance_test

Except for the third step, the other three provided errors are reported. The log is as follows. The generated flamegraph. SVG displays: ERROR: No Valid input provided to flamegraph.pl.

In addition, I performed GRPC-Transcode concurrent pressure test by myself: /usr/local/stapxx/samples/lj-lua-stacks.sxx --arg time=30 --skip-badvars -x $(pgrep -P $(cat logs/nginx.pid) -n -f worker) > /tmp/tmp.bt Response: /usr/bin/env: 'stap++' : No such file or directory

Here is the log information of four steps:

  1. sudo ./ci/performance_test.sh install_dependencies ... cc -c -I/tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/build/luajit-root/usr/local/openresty-debug/luajit/include/luajit-2.1 -I/tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/build/luajit-root/usr/local/openresty-debug/luajit/include/luajit-2.1 -pipe -O -W -Wall -Wpointer-arith -Wno-unused-parameter -g -DNGX_LUA_USE_ASSERT -DNGX_LUA_ABORT_AT_PANIC -O2 -DAPISIX_BASE_VER=0.0.0 -DNDK_SET_VAR -DNDK_UPSTREAM_LIST -DNDK_SET_VAR -DNDK_SET_VAR -DNDK_SET_VAR -DNDK_SET_VAR -I src/core -I src/event -I src/event/modules -I src/os/unix -I ../ngx_devel_kit-0.3.1/objs -I objs/addon/ndk -I ../ngx_lua-0.10.20/src/api -I ../ngx_stream_lua-0.0.10/src/api -I /tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/../lua-var-nginx-module/src -I objs -I src/http -I src/http/modules -I src/http/v2 -I ../ngx_devel_kit-0.3.1/src -I ../ngx_devel_kit-0.3.1/src -I ../ngx_devel_kit-0.3.1/objs -I objs/addon/ndk -I /tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/build/luajit-root/usr/local/openresty-debug/luajit/include/luajit-2.1 -I /tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/../mod_dubbo -I /tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/../mod_dubbo/utils -I /tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/../mod_dubbo/hessian2 -I /tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/../ngx_multi_upstream_module -I /tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/../apisix-nginx-module/src -I /tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/../wasm-nginx-module/src -I /tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/../wasm-nginx-module/wasmtime-c-api/include -I src/stream -I /tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/build/luajit-root/usr/local/openresty-debug/luajit/include/luajit-2.1 -I /tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/../ngx_multi_upstream_module -I /tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/../apisix-nginx-module/src/stream/ \ -o objs/addon/src/ngx_http_lua_ndk.o \ ../ngx_lua-0.10.20/src/ngx_http_lua_ndk.c In file included from ../ngx_lua-0.10.20/src/ngx_http_lua_script.h:11:0, from ../ngx_lua-0.10.20/src/ngx_http_lua_script.c:13: ../ngx_lua-0.10.20/src/ngx_http_lua_common.h:26:30: fatal error: ngx_meta_lua_api.h: No such file or directory In file included from ../ngx_lua-0.10.20/src/ngx_http_lua_log.h:12:0, from ../ngx_lua-0.10.20/src/ngx_http_lua_log.c:14: ../ngx_lua-0.10.20/src/ngx_http_lua_common.h:26:30: fatal error: ngx_meta_lua_api.h: No such file or directory compilation terminated. compilation terminated. objs/Makefile:2390: recipe for target 'objs/addon/src/ngx_http_lua_script.o' failed make[2]: [objs/addon/src/ngx_http_lua_script.o] Error 1 make[2]: Waiting for unfinished jobs.... objs/Makefile:2397: recipe for target 'objs/addon/src/ngx_http_lua_log.o' failed make[2]: [objs/addon/src/ngx_http_lua_log.o] Error 1 In file included from ../ngx_lua-0.10.20/src/ngx_http_lua_subrequest.h:12:0, from ../ngx_lua-0.10.20/src/ngx_http_lua_subrequest.c:14: ../ngx_lua-0.10.20/src/ngx_http_lua_common.h:26:30: fatal error: ngx_meta_lua_api.h: No such file or directory compilation terminated. objs/Makefile:2404: recipe for target 'objs/addon/src/ngx_http_lua_subrequest.o' failed make[2]: [objs/addon/src/ngx_http_lua_subrequest.o] Error 1 In file included from ../ngx_lua-0.10.20/src/ngx_http_lua_ndk.h:11:0, from ../ngx_lua-0.10.20/src/ngx_http_lua_ndk.c:13: ../ngx_lua-0.10.20/src/ngx_http_lua_common.h:26:30: fatal error: ngx_meta_lua_api.h: No such file or directory compilation terminated. objs/Makefile:2411: recipe for target 'objs/addon/src/ngx_http_lua_ndk.o' failed make[2]: [objs/addon/src/ngx_http_lua_ndk.o] Error 1 make[2]: Leaving directory '/tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/build/nginx-1.19.9' Makefile:10: recipe for target 'build' failed make[1]: [build] Error 2 make[1]: Leaving directory '/tmp/tmp.7pDCgwlqlx/openresty-1.19.9.1/build/nginx-1.19.9' Makefile:4: recipe for target 'all' failed make: *** [all] Error 2

2.sudo ./ci/performance_test.sh install_wrk2 ..... AR libluajit.a ar: u' modifier ignored sinceD' is the default (see `U') CC luajit.o BUILDVM jit/vmdef.lua LINK luajit OK Successfully built LuaJIT make[1]: Leaving directory '/data/server/wrk2/deps/luajit/src' CC src/wrk.c src/wrk.c: In function ‘response_complete’: src/wrk.c:529:16: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘int64_t {aka long int}’ [-Wformat=] printf(" expected_latency_timing = %lld\n", expected_latency_timing); ^ src/wrk.c:530:16: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘uint64_t {aka long unsigned int}’ [-Wformat=] printf(" now = %lld\n", now); ^ src/wrk.c:531:16: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘uint64_t {aka long unsigned int}’ [-Wformat=] printf(" expected_latency_start = %lld\n", expected_latency_start); ^ src/wrk.c:532:16: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘uint64_t {aka long unsigned int}’ [-Wformat=] printf(" c->thread_start = %lld\n", c->thread_start); ^ src/wrk.c:533:16: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘uint64_t {aka long unsigned int}’ [-Wformat=] printf(" c->complete = %lld\n", c->complete); ^ src/wrk.c:535:16: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘uint64_t {aka long unsigned int}’ [-Wformat=] printf(" latest_should_send_time = %lld\n", c->latest_should_send_time); ^ src/wrk.c:536:16: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘uint64_t {aka long unsigned int}’ [-Wformat=] printf(" latest_expected_start = %lld\n", c->latest_expected_start); ^ src/wrk.c:537:16: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘uint64_t {aka long unsigned int}’ [-Wformat=] printf(" latest_connect = %lld\n", c->latest_connect); ^ src/wrk.c:538:16: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘uint64_t {aka long unsigned int}’ [-Wformat=] printf(" latest_write = %lld\n", c->latest_write); ^ src/wrk.c:542:16: warning: format ‘%lld’ expects argument of type ‘long long int’, but argument 2 has type ‘uint64_t {aka long unsigned int}’ [-Wformat=] printf(" next expected_latency_start = %lld\n", expected_latency_start); ^ src/wrk.c: At top level: src/wrk.c:834:13: warning: ‘print_stats_latency’ defined but not used [-Wunused-function] static void print_stats_latency(stats *stats) { ^ CC src/net.c CC src/ssl.c CC src/aprintf.c CC src/stats.c CC src/script.c CC src/units.c CC src/ae.c CC src/zmalloc.c CC src/http_parser.c CC src/tinymt64.c CC src/hdr_histogram.c LUAJIT src/wrk.lua LINK wrk

  1. sudo ./ci/performance_test.sh install_stap_tools success

  2. ./ci/performance_test.sh run_performance_test

/local/openresty-debug/bin:/usr/local/openresty-debug/luajit/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin /usr/local/stapxx/samples/lj-lua-stacks.sxx --arg time=30 --skip-badvars -x 9056 Found exact match for libluajit: /usr/local/openresty/luajit/lib/libluajit-5.1.so.2.1.0 WARNING: cannot find module /usr/local/openresty/nginx/sbin/nginx debuginfo: No DWARF information found [man warning::debuginfo] WARNING: cannot find module /usr/local/openresty/luajit/lib/libluajit-5.1.so.2.1.0 debuginfo: No DWARF information found [man warning::debuginfo] semantic error: type definition 'lua_State' not found in '/usr/local/openresty/luajit/lib/libluajit-5.1.so.2.1.0': operator '@cast' at stapxx-4Sjl3xCx/luajit.stp:174:12 source: return @cast(L, "lua_State", "/usr/local/openresty/luajit/lib/libluajit-5.1.so.2.1.0")->glref->ptr64 ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at stapxx-4Sjl3xCx/nginx.lua.stp:9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unable to find global 'ngx_cycle' in /usr/local/openresty/nginx/sbin/nginx: operator '@var' at :9:43 source: return ngx_cycle_get_module_main_conf(@var("ngx_cycle", "/usr/local/openresty/nginx/sbin/nginx"), ^

semantic error: unresolved type : identifier 'gco' at stapxx-4Sjl3xCx/luajit.stp:461:5 source: gco = @cast(pt, "GCproto", "/usr/local/openresty/luajit/lib/libluajit-5.1.so.2.1.0")->chunkname->gcptr64 ^

Pass 2: analysis failed. [man error::pass2] Number of similar error messages suppressed: 136. Number of similar warning messages suppressed: 13104. Rerun with -v to see them.

tzssangglass commented 2 years ago

It looks like you downloaded the wrong package type.

That script is for the CI environment and may not apply to your server environment.

You can refer to this script for the steps, but you need to ensure each step is successful.

ref: https://anjia0532.github.io/2017/09/12/stap/ https://moonbingbing.gitbooks.io/openresty-best-practices/content/flame_graph/install.html

github-actions[bot] commented 1 year ago

This issue has been marked as stale due to 350 days of inactivity. It will be closed in 2 weeks if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@apisix.apache.org list. Thank you for your contributions.

github-actions[bot] commented 1 year ago

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.