Open eloycoto opened 5 years ago
Ok,
I think that I found the problem, I think that it's related to this commit:
https://github.com/openresty/luajit2/commit/0a9ff94c4a1fcec2c310dcb092da694f23186e23
Now If I run this nginx.conf to update some params, things are getting in a good shape:
^C[vagrant@localhost ~]$ cat /vagrant/nginx.conf
worker_processes 1;
daemon off;
pid /tmp/nginx.pid;
error_log logs/error.log;
events {
worker_connections 1024;
}
http {
server {
listen 8080;
access_log off;
location / {
content_by_lua_block {
require "jit.opt".start("hotexit=100")
function main()
local j = 0
for i=0,100 do
j = i+j
end
end
main()
ngx.say("Hello world")
}
}
}
}
So, my understanding is that it's a trace issue, not a Luajit compiling issue, maybe the best way is to add to stapxx the following probe:
probe process("/usr/local/openresty-debug/luajit/lib/libluajit-5.1.so.2.1.0").function("lj_vm_exit_interp") {
println("VM_exit_interp")
}
Also this commit https://github.com/openresty/luajit2/commit/99304a93bb661b3f3afbe4c54d50c705e14c35a3 can be a problem, but in Openresty 1.13 that commit is in there.
Regards
The issue is when Openresty 1.15.3 is in use, where
lj_trace_exit
function is only called on the first 10 request, after that, the function is never called again.The result is that profile jobs are not working correctly, due to not all the request are using luajit at all:
Output of `stapxx ./samples/ngx-lj-trace-exits.sxx -x $(worker PID)
To validate this behaviour:
nginx.conf
Run openresty in debug mode:
For this test, we can test with stapxx, but a simpler systemtap file can be used:
And if we run this tap, on the first 10 request,
lj_trace_exit
probe will work correctly, but after the 10th request there is no longer called, example:Trying to debug why this happens, there are no suspect code in
ngx-lua-module
, but Openresty bump the version of Luajit, and this commit maybe break something:https://github.com/openresty/luajit2/commit/864c72e31e9ea9488bd64a441790fee916481da3#diff-d232eb340601b1fd167c69f58d20f8f1
Also, try to get a proper traceback on this function, but for reasons all the systemtap traps only return the hex direction of the Luajit, so I couldn't find any pattern, and Luajit code is quite hard to understand :-(.
Systemtap taps: https://sourceware.org/systemtap/tapsets/
Example to get the caller:
Also try to get the human-readable code using
addr2line
but I have no luck.Moving forward I try to get trace on perf, maybe systemtap issue, but in perf I did not see the
lj_trace_exit
call at all:perf output: https://gist.github.com/eloycoto/60567b0f6de63a7e86792ad79331c6d6 perf data perf.zip
Also, tried with a custom perf probe, but no calls where seen in lj_trace_exit:
next steps:
Installation
These are the steps that I follow to install all the things commented above:
Useful information: