VOLTTRON / volttron

VOLTTRON Distributed Control System Platform
https://volttron.readthedocs.io/
Other
455 stars 216 forks source link

[develop] Potential issue with Health subsystem (due to using Identity in alerts) #2009

Closed tnesztler closed 5 years ago

tnesztler commented 5 years ago

Description of Issue

In our custom Historian agent, there seem to be a bug relating to the Health subsystem of the BaseHistorianAgent. When the agent goes from GOOD to BAD, it tries to send a message to the ALERTS topic. This doesn't work anymore due to the switch from Agent UUID to Identity (https://github.com/VOLTTRON/volttron/blob/develop/volttron/platform/vip/agent/subsystems/health.py#L91). I believe the latter cannot be serializable by looking at the Traceback we got when the agent couldn't push data. This however resulting in the agent crashing altogether vs skipping this round of publishing.

NOTE We are not running any Platform agent, only BACnetProxy, MasterDriverAgent and a custom Historian. This could explain the error we see in the first part of the Traceback.

Affected Version

Develop

Screenshots

Expected

Alerts are sent to the appropriate topic which can be picked up by another agent later on.

Actual

The BaseHistorian agent is unable to send the alert and crashes.

Steps to Reproduce

Use an agent that is based on BaseHistorianAgent, or probably BaseHistorianAgent itself, make it run into a publishing fault.

Additional Details

2019-05-14 13:55:27,307 (facts_serviceagent-1.3.2 16988) <stderr> ERROR: Traceback (most recent call last):
2019-05-14 13:55:27,308 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/async.py", line 190, in _run_call
2019-05-14 13:55:27,310 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/agent/base_historian.py", line 960, in _send_alert_callback
2019-05-14 13:55:27,310 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:     self.vip.health.send_alert(key, alert_status)
2019-05-14 13:55:27,311 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/vip/agent/subsystems/health.py", line 90, in send_alert
2019-05-14 13:55:27,312 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/agent/utils.py", line 219, in get_fq_identity
2019-05-14 13:55:27,313 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/agent/utils.py", line 181, in get_platform_instance_name
2019-05-14 13:55:27,314 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/agent/utils.py", line 174, in load_platform_config
2019-05-14 13:55:27,315 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/usr/lib/python2.7/ConfigParser.py", line 279, in options
2019-05-14 13:55:27,316 (facts_serviceagent-1.3.2 16988) <stderr> ERROR: NoSectionError: No section: 'volttron'
2019-05-14 13:55:27,317 (facts_serviceagent-1.3.2 16988) <stderr> ERROR: <bound method FactsService._send_alert_callback of <facts_service.agent.FactsService object at 0x7fed641cbf90>> failed with NoSectionError
2019-05-14 13:55:27,317 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:
2019-05-14 13:55:27,323 (facts_serviceagent-1.3.2 16988) <stderr> ERROR: Exception in thread Thread-2:
2019-05-14 13:55:27,324 (facts_serviceagent-1.3.2 16988) <stderr> ERROR: Traceback (most recent call last):
2019-05-14 13:55:27,325 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/usr/lib/python2.7/threading.py", line 801, in __bootstrap_inner
2019-05-14 13:55:27,325 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/usr/lib/python2.7/threading.py", line 754, in run
2019-05-14 13:55:27,327 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/agent/base_historian.py", line 978, in _process_loop
2019-05-14 13:55:27,327 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:     self._do_process_loop()
2019-05-14 13:55:27,328 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/agent/base_historian.py", line 1037, in _do_process_loop
2019-05-14 13:55:27,329 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:     cache_full = backupdb.backup_new_data((x for x in new_to_publish if x is not None))
2019-05-14 13:55:27,331 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/agent/base_historian.py", line 1386, in backup_new_data
2019-05-14 13:55:27,332 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:     (timestamp, source, topic_id, dumps(value), dumps(headers)))
2019-05-14 13:55:27,332 (facts_serviceagent-1.3.2 16988) <stderr> ERROR: OperationalError: unable to open database file
2019-05-14 13:55:27,332 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:
2019-05-14 13:55:27,347 (facts_serviceagent-1.3.2 16988) <stderr> ERROR: Traceback (most recent call last):
2019-05-14 13:55:27,348 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/async.py", line 190, in _run_call
2019-05-14 13:55:27,350 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:     exc_info, result = None, func(*args, **kwargs)   # pylint: disable=star-args
2019-05-14 13:55:27,351 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/agent/base_historian.py", line 960, in _send_alert_callback
2019-05-14 13:55:27,352 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:     self.vip.health.send_alert(key, alert_status)
2019-05-14 13:55:27,354 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/vip/agent/subsystems/health.py", line 91, in send_alert
2019-05-14 13:55:27,355 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:     topic = topics.ALERTS(agent_class=agent_class, identity=fq_identity)
2019-05-14 13:55:27,356 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/messaging/utils.py", line 158, in __call__
2019-05-14 13:55:27,357 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:     return self.__class__(normtopic(self.vformat(kwargs)))
2019-05-14 13:55:27,358 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/messaging/utils.py", line 168, in vformat
2019-05-14 13:55:27,359 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:     return formatter.vformat(self, (), kwargs)
2019-05-14 13:55:27,360 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/usr/lib/python2.7/string.py", line 563, in vformat
2019-05-14 13:55:27,361 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:     result = self._vformat(format_string, args, kwargs, used_args, 2)
2019-05-14 13:55:27,362 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:   File "/home/ecorithm/volttron/volttron/platform/messaging/utils.py", line 123, in _vformat
2019-05-14 13:55:27,363 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:     raise e
2019-05-14 13:55:27,363 (facts_serviceagent-1.3.2 16988) <stderr> ERROR: KeyError: u'agent_uuid'
2019-05-14 13:55:27,364 (facts_serviceagent-1.3.2 16988) <stderr> ERROR: <bound method FactsService._send_alert_callback of <facts_service.agent.FactsService object at 0x7fed641cbf90>> failed with KeyError
2019-05-14 13:55:27,364 (facts_serviceagent-1.3.2 16988) <stderr> ERROR:
craig8 commented 5 years ago

@tnesztler this issue came about when I changed from using the agent_uuid to identity which is more appropriate in the alerting system. Shortly I will be submitting a pull request for this issue.

craig8 commented 5 years ago

@tnesztler please see pull request #2010

craig8 commented 5 years ago

@tnesztler reopen if you see this issue again.