3scale / APIcast

3scale API Gateway
Apache License 2.0
304 stars 169 forks source link

Openresty 1.15.3 profile issues. #1096

Open eloycoto opened 5 years ago

eloycoto commented 5 years ago

The issue is when Openresty 1.15.3 is in use, where lj_trace_exit function is only called on the first 10 request, after that, the function is never called again.

The result is that profile jobs are not working correctly, due to not all the request are using luajit at all:

Output of `stapxx ./samples/ngx-lj-trace-exits.sxx -x $(worker PID)

289 out of 19088 requests used compiled traces generated by LuaJIT.
Distribution of LuaJIT trace exit times per request for 289 sample(s):
(min/avg/max: 1/1/9)
value |-------------------------------------------------- count
    0 |                                                     0
    1 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@           160
    2 |@@@@@@@@@@@@@@@@@@@@@@@                             92
    4 |@@@@@@@@@                                           36
    8 |                                                     1
   16 |                                                     0
   32 |                                                     0

To validate this behaviour:

nginx.conf

worker_processes  1;
daemon off;
pid       /tmp/nginx.pid;
error_log logs/error.log;

events {
    worker_connections 1024;
}

http {
  server {
  listen 8080;
  access_log  off;
    location / {
      content_by_lua_block {
        function main()
          local j = 0
          for i=0,1000 do
            j = i+j
          end
        end
        main()
        ngx.say("Hello world")
      }
    }
  }
}

Run openresty in debug mode:

openresty-debug -c nginx.conf -p $(pwd)

For this test, we can test with stapxx, but a simpler systemtap file can be used:

probe process("/usr/local/openresty-debug/nginx/sbin/nginx").function("ngx_http_free_request")
{
    println("Request finish")
}

probe process("/usr/local/openresty-debug/luajit/lib/libluajit-5.1.so.2.1.0").function("lj_trace_exit") {
    println("Trace exit")
}

And if we run this tap, on the first 10 request, lj_trace_exit probe will work correctly, but after the 10th request there is no longer called, example:

#stap lj.stp
Trace exit
Request finish
Trace exit
Request finish
Trace exit
Request finish
Trace exit
Request finish
Trace exit
Request finish
Trace exit
Request finish
Trace exit
Request finish
Trace exit
Request finish
Trace exit
Request finish
Request finish
Request finish
Request finish
Request finish

Trying to debug why this happens, there are no suspect code in ngx-lua-module, but Openresty bump the version of Luajit, and this commit maybe break something:

https://github.com/openresty/luajit2/commit/864c72e31e9ea9488bd64a441790fee916481da3#diff-d232eb340601b1fd167c69f58d20f8f1

Also, try to get a proper traceback on this function, but for reasons all the systemtap traps only return the hex direction of the Luajit, so I couldn't find any pattern, and Luajit code is quite hard to understand :-(.

Systemtap taps: https://sourceware.org/systemtap/tapsets/

Example to get the caller:

probe process("/usr/local/openresty-debug/luajit/lib/libluajit-5.1.so.2.1.0").function("lj_trace_exit") {
    println("Trace exit")
    println(symfileline(caller_addr()))
}

Also try to get the human-readable code using addr2line but I have no luck.

Moving forward I try to get trace on perf, maybe systemtap issue, but in perf I did not see the lj_trace_exit call at all:

perf record -a  -p 19191 -g

perf output: https://gist.github.com/eloycoto/60567b0f6de63a7e86792ad79331c6d6 perf data perf.zip

Also, tried with a custom perf probe, but no calls where seen in lj_trace_exit:

# perf probe -x /usr/local/openresty-debug/luajit/lib/libluajit-5.1.so.2.1.0 --funcs | grep lj_trace_exit
lj_trace_exit
# perf probe -x /usr/local/openresty-debug/luajit/lib/libluajit-5.1.so.2.1.0 lj_trace_exit
Failed to get call frame on 0xa6376
Added new event:
  probe_libluajit:lj_trace_exit (on lj_trace_exit in /usr/local/openresty-debug/luajit/lib/libluajit-5.1.so.2.1.0)

You can now use it in all perf tools, such as:

        perf record -e probe_libluajit:lj_trace_exit -aR sleep 1

next steps:

Installation

These are the steps that I follow to install all the things commented above:

yum install yum-utils
yum-config-manager --add-repo https://openresty.org/package/centos/openresty.repo
yum install openresty openresty-debug-debuginfo openresty-debug openresty-pcre-debuginfo openresty-pcre-devel openresty-zlib-debuginfo

yum install -y kernel-devel kernel-debuginfo kernel-debuginfo-common
debuginfo-install kernel-3.10.0-957.12.2.el7.x86_64
yum install -y systemtap systemtap-runtime

Useful information:

eloycoto commented 5 years ago

Ok,

I think that I found the problem, I think that it's related to this commit:

https://github.com/openresty/luajit2/commit/0a9ff94c4a1fcec2c310dcb092da694f23186e23

Now If I run this nginx.conf to update some params, things are getting in a good shape:


^C[vagrant@localhost ~]$ cat /vagrant/nginx.conf
worker_processes  1;
daemon off;
pid       /tmp/nginx.pid;
error_log logs/error.log;

events {
    worker_connections 1024;
}

http {
  server {
  listen 8080;
  access_log  off;
    location / {
      content_by_lua_block {
        require "jit.opt".start("hotexit=100")
        function main()
          local j = 0
          for i=0,100 do
            j = i+j
          end
        end
        main()
        ngx.say("Hello world")
      }
    }
  }
}

So, my understanding is that it's a trace issue, not a Luajit compiling issue, maybe the best way is to add to stapxx the following probe:

probe process("/usr/local/openresty-debug/luajit/lib/libluajit-5.1.so.2.1.0").function("lj_vm_exit_interp") {
    println("VM_exit_interp")
}

Also this commit https://github.com/openresty/luajit2/commit/99304a93bb661b3f3afbe4c54d50c705e14c35a3 can be a problem, but in Openresty 1.13 that commit is in there.

Regards