abrt / retrace-server

Application for remote coredump analysis
GNU General Public License v2.0
40 stars 30 forks source link

Spurious numbers in metrics #430

Open mgrabovsky opened 3 years ago

mgrabovsky commented 3 years ago

In the 48 hours following the deployment of the Prometheus metrics endpoint, at least two bugs have been made apparent thanks to the Grafana dashboard:

  1. Failed tasks often (but not always) seem to be counted twice in retrace_tasks_finished{result="fail"}.
  2. The number of running tasks (retrace_tasks_running) sporadically jumps up to wild numbers, such as 70, 18 or 39, for a few minutes at a time. The maximum allowed number of running tasks (MaxParallelTasks) is 12 on retrace.fp.org, so these numbers make no sense.
mgrabovsky commented 3 years ago

The relevant code pertaining to 2. (running tasks) is located in retrace.py. It's parsing the output of ps so I can imagine there being some funny interaction with threading, how processes are listed etc.

Edit: ~I'm wondering if we may be witnessing some race conditions here since multiple workers may be writing to the SQLite database at the same time. Though I hope SQLite should be able to handle that.~

Edit 2: OK, it wasn't a database bug. Here's a fragment of the ps output from one of the moments when an unusually high number of running tasks was detected:

    PID    PPID ELAPSED CMD
1578079       1    1727 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
[...]
1589317 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589318 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589319 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589320 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589321 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589322 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589323 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589324 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589325 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589326 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589327 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589328 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589329 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589330 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589331 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589332 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589333 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589334 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589335 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589336 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589337 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589338 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589339 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589340 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589341 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589342 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589343 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589344 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589345 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589346 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589347 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589348 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589349 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589350 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589351 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589352 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589353 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589354 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589355 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589356 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589357 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589358 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589359 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589360 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589362 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589363 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589365 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589366 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589367 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147
1589368 1578079       0 /usr/bin/python3.6 /usr/bin/retrace-server-worker 762329147