cemc-oper / nwpc-hpc-exporter

An exporter for Prometheus using in NWPC to monitor HPC status.
GNU General Public License v3.0
1 stars 0 forks source link

exporter exit when ssh connection is failed. #1

Open perillaroc opened 6 years ago

perillaroc commented 6 years ago

Exception paramiko.ssh_exception.NoValidConnectionsError should be catched.

2018-02-04 05:07:59.919251 reconnect ssh
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/nwpc_hpc_exporter-0.1.0-py3.6.egg/nwpc_hpc_exporter/disk_usage/exporter.py", line 35, in process_request
  File "/usr/local/lib/python3.6/site-packages/nwpc_hpc_exporter-0.1.0-py3.6.egg/nwpc_hpc_exporter/disk_usage/collector.py", line 25, in get_disk_usage
  File "/usr/local/lib/python3.6/site-packages/nwpc_hpc_exporter-0.1.0-py3.6.egg/nwpc_hpc_exporter/disk_usage/collector.py", line 17, in run_cmquota_command
  File "/usr/local/lib/python3.6/site-packages/paramiko-2.4.0-py3.6.egg/paramiko/client.py", line 480, in exec_command
    chan = self._transport.open_session(timeout=timeout)
  File "/usr/local/lib/python3.6/site-packages/paramiko-2.4.0-py3.6.egg/paramiko/transport.py", line 767, in open_session
    timeout=timeout)
  File "/usr/local/lib/python3.6/site-packages/paramiko-2.4.0-py3.6.egg/paramiko/transport.py", line 854, in open_channel
    raise SSHException('SSH session not active')
paramiko.ssh_exception.SSHException: SSH session not active

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/disk_usage_exporter", line 11, in <module>
    load_entry_point('nwpc-hpc-exporter==0.1.0', 'console_scripts', 'disk_usage_exporter')()
  File "/usr/local/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.6/site-packages/click-6.7-py3.6.egg/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/usr/local/lib/python3.6/site-packages/nwpc_hpc_exporter-0.1.0-py3.6.egg/nwpc_hpc_exporter/disk_usage/exporter.py", line 81, in main
  File "/usr/local/lib/python3.6/site-packages/nwpc_hpc_exporter-0.1.0-py3.6.egg/nwpc_hpc_exporter/disk_usage/exporter.py", line 44, in process_request
  File "/usr/local/lib/python3.6/site-packages/nwpc_hpc_exporter-0.1.0-py3.6.egg/nwpc_hpc_exporter/disk_usage/collector.py", line 10, in get_ssh_client
  File "/usr/local/lib/python3.6/site-packages/paramiko-2.4.0-py3.6.egg/paramiko/client.py", line 357, in connect
    raise NoValidConnectionsError(errors)
paramiko.ssh_exception.NoValidConnectionsError: [Errno None] Unable to connect to port 22 on 10.20.49.131
perillaroc commented 5 years ago
Traceback (most recent call last):
  File "/usr/local/bin/workload_exporter", line 11, in <module>
    load_entry_point('nwpc-hpc-exporter', 'console_scripts', 'workload_exporter')()
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "/nwpc-hpc-exporter/nwpc_hpc_exporter/workload/exporter.py", line 59, in main
    collector.process_request(tasks)
  File "/nwpc-hpc-exporter/nwpc_hpc_exporter/workload/collector/__init__.py", line 39, in process_request
    self.process_single_task(a_task)
  File "/nwpc-hpc-exporter/nwpc_hpc_exporter/workload/collector/__init__.py", line 59, in process_single_task
    model = self.request(category_list, client)
  File "/nwpc-hpc-exporter/nwpc_hpc_exporter/workload/collector/slurm_partition/__init__.py", line 12, in request
    return get_result(category_list, client)
  File "/nwpc-hpc-exporter/nwpc_hpc_exporter/workload/collector/slurm_partition/request.py", line 32, in get_result
    std_out_string, std_error_out_string = run_sinfo_command(client)
  File "/nwpc-hpc-exporter/nwpc_hpc_exporter/workload/collector/slurm_partition/request.py", line 28, in run_sinfo_command
    return run_command(client, command)
  File "/nwpc-hpc-exporter/nwpc_hpc_exporter/base/run.py", line 6, in run_command
    stdin, stdout, stderr = client.exec_command(command)
  File "/usr/local/lib/python3.7/site-packages/paramiko/client.py", line 480, in exec_command
    chan = self._transport.open_session(timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/paramiko/transport.py", line 767, in open_session
    timeout=timeout)
  File "/usr/local/lib/python3.7/site-packages/paramiko/transport.py", line 891, in open_channel
    raise e
  File "/usr/local/lib/python3.7/site-packages/paramiko/transport.py", line 1909, in run
    ptype, m = self.packetizer.read_message()
  File "/usr/local/lib/python3.7/site-packages/paramiko/packet.py", line 426, in read_message
    header = self.read_all(self.__block_size_in, check_rekey=True)
  File "/usr/local/lib/python3.7/site-packages/paramiko/packet.py", line 274, in read_all
    x = self.__socket.recv(n)
ConnectionResetError: [Errno 104] Connection reset by peer