Closed RussellLuo closed 8 years ago
Hi Russel,
Every monitor has several separate "threads" (it's more complicated than that, but it's a good enough approximation).
One of the threads is in charge of dealing with presenting the website and responding to the API. Another one is in charge of acting as a "proxy" - it's the one which receives requests, puts them through the middleware, calls the API, and returns the transformed responses.
There is a third one which is in charge of running background jobs. Its code is here:
https://github.com/APItools/monitor/blob/master/lua/crontab.lua
As you can see, this background thread does a bunch of things, not only saving the delayed traces.
The background thread is initialized here: https://github.com/APItools/monitor/blob/master/config/global.conf#L8-L12. Nginx should have kept it alive, but it seems that in your particular case it has died.
This has happened a couple times before in some of our workers (we have hundreds of them). It's frustratingly difficult problem to reproduce since it happens so infrequently. Still, we're investigating how to prevent it.
In the meantime, the only way to revive the background jobs is to re-launch nginx. If you have installed it on-premise with the deb package you should be able to do so using supervisor:
$ sudo supervisorctl restart openresty
That should re-start the background jobs and get the traces saved automatically. If that doesn't work, you can try restarting supervisor (sudo service supervisor restart
). A last thing to try would be rebooting (sudo reboot
). Let's hope you don't need to get to that.
Please let me know how it goes.
Regards,
Enrique
Thanks for your detailed reply, kikito!
After changing the log level of nginx to info
:
# config/nginx.conf
error_log stderr info;
I found this message in the log file:
2015/04/01 22:10:59 [info] 218#0: [lua] crontab.lua:348: run(): [cron] skipping run of async_traces job async_traces-A486BC47-05D4-40AC-80CA-89471C56609E: slug name not set, context: ngx.timer
which is obviously caused by the failure of this condition:
https://github.com/APItools/monitor/blob/master/lua/crontab.lua#L340-L340
Since the value of crontab.enabled()
is determined by three factors:
https://github.com/APItools/monitor/blob/master/lua/crontab.lua#L332-L332
and the value of crontab.forced
is determined by the environment variable SLUG_CRON_FORCED
:
https://github.com/APItools/monitor/blob/master/lua/crontab.lua#L369-L369
so I set SLUG_CRON_FORCED
to 1:
# config/env.conf
env SLUG_CRON_FORCED=1;
Then, everything works well now.
But I still have some questions:
crontab.disabled
, crontab.forced
and slug_name
?SLUG_CRON_FORCED
to 1 to make crontab.initialize()
work?crontab.initialize()
executed two times, first triggerd in "config/nginx.conf", then trigger near the end of "lua/crontab.lua". Why? This should be fixed by #53. If it is not, please reopen the issue.
Cron now runs all the time, not checking any environment variables.
I've just installed the APItools monitor in my server according to On-Premise APItools Monitor. After adding a service, I can't see any traces on the "Traces" tab when I access my local APIs.
After diving into the source code of
monitor
, I found that traces are saved here:Trace:async_save()
enqueue the trace object which will be actually saved later byModel:consume()
. And then,Model:consume()
is called bysystem.cron_trigger()
orsystem.cron_flush()
.Finally, I found the route mapping here:
I said god bless me when I am trying to POST to "/api/system/cron" manually, it did work! When I access my APIs, traces are generated and shown in list. Thank godness!
But my quessions here are: