freeipa / freeipa-healthcheck

Check the health of a freeIPA installation
GNU General Public License v3.0
50 stars 28 forks source link

Improved logging when plugins are cancelled for taking too long #323

Open yrro opened 8 months ago

yrro commented 8 months ago

With these changes, it's a lot more obvious when reported errors are caused by a health check timeout.

Starting Execute IPA Healthcheck for Prometheus...
Internal server error 'Link'
--- Health check pki.server.healthcheck.clones.connectivity_and_data:ClonesConnectivyAndDataCheck cancelled after after 7 sec ---
  File "/usr/bin/ipa-healthcheck", line 11, in <module>
    load_entry_point('ipahealthcheck==0.12', 'console_scripts', 'ipa-healthcheck')()
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/main.py", line 52, in main
    sys.exit(ipachecks.run_healthcheck())
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 494, in run_healthcheck
    config.timeout))
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 238, in run_plugins
    for result in run_plugin(plugin, available, timeout):
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 74, in run_plugin
    for result in plugin.check():
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/plugin.py", line 18, in wrapper
    for result in f(*args, **kwds):
  File "/usr/lib/python3.6/site-packages/pki/server/healthcheck/clones/connectivity_and_data.py", line 161, in check
    self.check_kra_clones()
  File "/usr/lib/python3.6/site-packages/pki/server/healthcheck/clones/connectivity_and_data.py", line 56, in check_kra_clones
    '/kra/admin/kra/getStatus')
  File "/usr/lib/python3.6/site-packages/pki/server/healthcheck/clones/plugin.py", line 52, in get_status
    self.instance.export_ca_cert()
  File "/usr/lib/python3.6/site-packages/pki/server/__init__.py", line 277, in export_ca_cert
    nssdb.extract_ca_cert(self.ca_cert, nickname)
  File "/usr/lib/python3.6/site-packages/pki/nssdb.py", line 1865, in extract_ca_cert
    output_format="PEM")
  File "/usr/lib/python3.6/site-packages/pki/nssdb.py", line 1449, in export_cert_from_db
    subprocess.check_call(cmd)
  File "/usr/lib64/python3.6/subprocess.py", line 306, in check_call
    retcode = call(*popenargs, **kwargs)
  File "/usr/lib64/python3.6/subprocess.py", line 289, in call
    return p.wait(timeout=timeout)
  File "/usr/lib64/python3.6/subprocess.py", line 1477, in wait
    (pid, sts) = self._try_wait(0)
  File "/usr/lib64/python3.6/subprocess.py", line 1424, in _try_wait
    (pid, sts) = os.waitpid(self.pid, wait_flags)
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 61, in signal_handler
    traceback.print_stack()
--- The following messages were logged by the plugin after it was cancelled. They may not indicate the reason why the plugin took too long to run. ---
Unable to reach KRA at https://ipa6.example.com:443: Health check pki.server.healthcheck.clones.connectivity_and_data:ClonesConnectivyAndDataCheck cancelled after after 7 sec
--- Increasing the timeout in /etc/ipahealthcheck/ipahealthcheck.conf may allow this healthcheck to complete before the deadline. ---
FileNotFoundException: /tmp/tmpk1c1jgm2/password.txt (No such file or directory)
--- Health check ipahealthcheck.ipa.certs:IPACertRevocation cancelled after after 7 sec ---
  File "/usr/bin/ipa-healthcheck", line 11, in <module>
    load_entry_point('ipahealthcheck==0.12', 'console_scripts', 'ipa-healthcheck')()
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/main.py", line 52, in main
    sys.exit(ipachecks.run_healthcheck())
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 494, in run_healthcheck
    config.timeout))
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 238, in run_plugins
    for result in run_plugin(plugin, available, timeout):
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 74, in run_plugin
    for result in plugin.check():
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/plugin.py", line 18, in wrapper
    for result in f(*args, **kwds):
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/ipa/certs.py", line 1303, in check
    all=True)
  File "/usr/lib/python3.6/site-packages/ipalib/frontend.py", line 471, in __call__
    return self.__do_call(*args, **options)
  File "/usr/lib/python3.6/site-packages/ipalib/frontend.py", line 499, in __do_call
    ret = self.run(*args, **options)
  File "/usr/lib/python3.6/site-packages/ipalib/frontend.py", line 816, in run
    return self.execute(*args, **options)
  File "/usr/lib/python3.6/site-packages/ipaserver/plugins/cert.py", line 1388, in execute
    chain=chain,
  File "/usr/lib/python3.6/site-packages/ipalib/frontend.py", line 471, in __call__
    return self.__do_call(*args, **options)
  File "/usr/lib/python3.6/site-packages/ipalib/frontend.py", line 499, in __do_call
    ret = self.run(*args, **options)
  File "/usr/lib/python3.6/site-packages/ipalib/frontend.py", line 816, in run
    return self.execute(*args, **options)
  File "/usr/lib/python3.6/site-packages/ipaserver/plugins/ca.py", line 252, in execute
    msg = set_certificate_attrs(result['result'], options)
  File "/usr/lib/python3.6/site-packages/ipaserver/plugins/ca.py", line 192, in set_certificate_attrs
    der = ca_api.read_ca_cert(ca_id)
  File "/usr/lib/python3.6/site-packages/ipaserver/plugins/dogtag.py", line 2113, in read_ca_cert
    headers={'Accept': 'application/pkix-cert'})
  File "/usr/lib/python3.6/site-packages/ipaserver/plugins/dogtag.py", line 1267, in _ssldo
    method=method, headers=headers, body=body
  File "/usr/lib/python3.6/site-packages/ipapython/dogtag.py", line 218, in https_request
    method=method, headers=headers)
  File "/usr/lib/python3.6/site-packages/ipapython/dogtag.py", line 272, in _httplib_request
    res = conn.getresponse()
  File "/usr/lib64/python3.6/http/client.py", line 1365, in getresponse
    response.begin()
  File "/usr/lib64/python3.6/http/client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "/usr/lib64/python3.6/http/client.py", line 281, in _read_status
    line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
  File "/usr/lib64/python3.6/socket.py", line 586, in readinto
    return self._sock.recv_into(b)
  File "/usr/lib64/python3.6/ssl.py", line 1005, in recv_into
    return self.read(nbytes, buffer)
  File "/usr/lib64/python3.6/ssl.py", line 867, in read
    return self._sslobj.read(len, buffer)
  File "/usr/lib64/python3.6/ssl.py", line 590, in read
    v = self._sslobj.read(len, buffer)
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 61, in signal_handler
    traceback.print_stack()
--- The following messages were logged by the plugin after it was cancelled. They may not indicate the reason why the plugin took too long to run. ---
--- Increasing the timeout in /etc/ipahealthcheck/ipahealthcheck.conf may allow this healthcheck to complete before the deadline. ---
GSSAPI client step 1
GSSAPI client step 1
GSSAPI client step 1
GSSAPI client step 2
--- Health check ipahealthcheck.meta.core:MetaCheck cancelled after after 7 sec ---
  File "/usr/bin/ipa-healthcheck", line 11, in <module>
    load_entry_point('ipahealthcheck==0.12', 'console_scripts', 'ipa-healthcheck')()
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/main.py", line 52, in main
    sys.exit(ipachecks.run_healthcheck())
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 494, in run_healthcheck
    config.timeout))
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 238, in run_plugins
    for result in run_plugin(plugin, available, timeout):
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 74, in run_plugin
    for result in plugin.check():
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/plugin.py", line 18, in wrapper
    for result in f(*args, **kwds):
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/meta/core.py", line 65, in check
    raiseonerr=False,)
  File "/usr/lib/python3.6/site-packages/ipapython/ipautil.py", line 538, in run
    stdout, stderr = p.communicate(stdin)
  File "/usr/lib64/python3.6/subprocess.py", line 863, in communicate
    stdout, stderr = self._communicate(input, endtime, timeout)
  File "/usr/lib64/python3.6/subprocess.py", line 1534, in _communicate
    ready = selector.select(timeout)
  File "/usr/lib64/python3.6/selectors.py", line 376, in select
    fd_event_list = self._poll.poll(timeout)
  File "/usr/lib/python3.6/site-packages/ipahealthcheck/core/core.py", line 61, in signal_handler
    traceback.print_stack()
--- The following messages were logged by the plugin after it was cancelled. They may not indicate the reason why the plugin took too long to run. ---
--- Increasing the timeout in /etc/ipahealthcheck/ipahealthcheck.conf may allow this healthcheck to complete before the deadline. ---

One other significant change: when a plugin is interrupted by a timeout, a CRITICAL result is returned with the key healthcheck_timeout.

Fixes: https://github.com/freeipa/freeipa-healthcheck/issues/322

rcritten commented 8 months ago

Thanks for the patch. I'll try to review it soon.

rcritten commented 8 months ago

I don't think including a traceback is necessary when a timeout occurs. I think this would lead to confusion in users thinking that this is a code issue and not merely a timeout. If it is persistent then users can debug logging to identify where it is timing out.

I set timeout=1 and ran --source ipahealthcheck.ipa.certs and the tracebacks don't really tell what is going on other than reads are failing (which isn't the root cause of the issue).