OpenTSDB / tcollector

Data collection framework for OpenTSDB
http://opentsdb.net
GNU Lesser General Public License v3.0
513 stars 359 forks source link

fix proc status #425

Closed tongtie closed 4 years ago

tongtie commented 5 years ago

I got these error:

2019-08-13 14:33:47,828 tcollector[1896951] [line:1345] WARNING: Terminating collector hbase_master.py after 615 seconds of inactivity
2019-08-13 14:33:47,829 tcollector[1896951] [line:210] INFO: Waiting 5s for PID 2527527 (hbase_master.py) to exit...
2019-08-13 14:33:48,831 tcollector[1896951] [line:75] ERROR: hbase_master.py still has a process (pid=2527527) and is being reset, terminating

The log said that the program still exists, but actually it is gone. So I add these code to verify this.

def register_collector(collector):
        ...
        if col.proc is not None:
            try:
                os.kill(col.proc.pid, 0)
                LOG.info('pid=%d is running' % col.proc.pid)
            except Exception as e:
                LOG.error('pid=%d not running. %s' % (col.proc.pid, e))
            LOG.error('%s still has a process (pid=%d) and is being reset,'
                      ' terminating', col.name, col.proc.pid)

out:

2019-08-13 16:30:26,347 tcollector[2575745] [line:1136] INFO: Heartbeat (13 collectors running)
2019-08-13 16:30:26,350 tcollector[2575745] [line:1350] WARNING: Terminating collector hbase_master.py after 601 seconds of inactivity
2019-08-13 16:30:26,351 tcollector[2575745] [line:215] INFO: Waiting 5s for PID 2575753 (hbase_master.py) to exit...
2019-08-13 16:30:27,351 tcollector[2575745] [line:78] ERROR: pid=2575753 not running. [Errno 3] No such process
2019-08-13 16:30:27,352 tcollector[2575745] [line:80] ERROR: hbase_master.py still has a process (pid=2575753) and is being reset, terminating

So I add self.proc = None in col.shutdown() to solve this problem.