ProgrammersOfVilnius / pov-check-health

Debian package that runs basic system health monitoring checks hourly from cron
https://launchpad.net/~pov/+archive/ppa
2 stars 0 forks source link

Wishlist: checkthreads #8

Closed mgedmin closed 5 years ago

mgedmin commented 9 years ago

I had a bug once where HTTP worker threads would die and zope.app.server didn't notice and the whole app ground to a halt, so I used this to check that at least N threads were running in an app.

I also had a different bug where zdaemon's transcript thread would die on disk full and then the manager process would deadlock days later, so I used this to check.

checkthreads() {
    # usage: checkthreads N pgrep-args
    # check that a process found by pgrep has at least N threads
    shouldbe=$1
    shift
    pid=$(pgrep "$@")
    test $(echo "$pid"|wc -l) -gt 1 && {
        # there's a race condition: when zope forks a child to run aspell, the child also appears to be zope for a brief moment
        # hopefully 100ms is enough for it to do the exec, and hopefully we won't encounter a second race so quickly
        sleep 0.1
        pid=$(pgrep "$@")
    }
    test -z "$pid" && {
        warn "no process found by pgrep $@"
        return 1
    }
    test $(echo "$pid"|wc -l) -gt 1 && {
        warn "more than one process found by pgrep $@:" $pid
        ps $pid
        return 1
    }
    threads=$(ls "/proc/$pid/task"|wc -l)
    test $threads -lt $shouldbe && {
        warn "$@ ($pid) has only $threads threads instead of $shouldbe"
        return 1
    }
    # XXX why no warning if $threads -ft $shouldbe?  because temporary extra
    # threads aren't harmful?  do we ever have those in zopeland?
    return 0
}