Closed tnesztler closed 5 years ago
@tnesztler to "prevent from publishing data" you put it into cache-only mode or how are you preventing it from publishing?
Looking into the logs more closely, the Base Historian Agent failed due a gevent
issue which seems to be the root problem.
2019-05-23 10:31:07,766 (facts_serviceagent-1.3.2 21494) <stderr> ERROR: Traceback (most recent call last):
2019-05-23 10:31:07,769 (facts_serviceagent-1.3.2 21494) <stderr> ERROR: File "/home/ecorithm/volttron/env/local/lib/python2.7/site-packages/gevent/threadpool.py", line 207, in _worker
2019-05-23 10:31:07,770 (facts_serviceagent-1.3.2 21494) <stderr> ERROR: error: [Errno 16] Device or resource busy
2019-05-23 10:31:07,771 (facts_serviceagent-1.3.2 21494) <stderr> ERROR: (<ThreadPool at 0x7f60943fdc90 0/1/10>, <built-in function getaddrinfo>) failed with error
2019-05-23 10:31:07,771 (facts_serviceagent-1.3.2 21494) <stderr> ERROR:
2019-05-23 10:31:07,764 (facts_serviceagent-1.3.2 21494) facts_service.agent ERROR: Error when attempting to publish to target: ConnectionError(MaxRetryError("HTTPSConnectionPool(host='ZZZZZZZZZZZZZ', port=443): Max retries exceeded with url: XXXXXXXXX (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f60943fd050>: Failed to establish a new connection: [Errno 16] Device or resource busy',))",),)
I have tried to manually make the agent faulty w/o it resulting in the same error (it handles publishing errors w/o issues).
@tnesztler the factservice is making a request using https. One of the things that we have done with adding https to volttron is to set a REQUEST_CA_BUNDLE environmental variable for connecting to the local https machine. Could this possibly be an issue for an external connection?
In the latest updates that you've pushed to develop
, it seems to have been fixed, intentionally or not since even if it is crashing on something else, the agent restarts itself and no loss of data is happening.
I'll look at the REQUESTS_CA_BUNDLE
env var if another issue like that arise. So far it pushes data to our API w/o issues.
That's the current error I have related to gevent (happening every day but not at the same time!) but I think we can close this specific issue for the time being.
2019-06-04 04:45:05,977 (facts_serviceagent-1.3.2 8353) <stderr> ERROR: SystemError: (libev) error creating signal/async pipe: Too many open files
2019-06-04 04:45:05,978 (facts_serviceagent-1.3.2 8353) <stderr> ERROR: Traceback (most recent call last):
2019-06-04 04:45:05,979 (facts_serviceagent-1.3.2 8353) <stderr> ERROR: File "gevent/corecext.pyx", line 2067, in gevent.corecext._syserr_cb (gevent/gevent.corecext.c:35964)
2019-06-04 04:45:05,980 (facts_serviceagent-1.3.2 8353) <stderr> ERROR: File "gevent/corecext.pyx", line 352, in gevent.corecext.loop._handle_syserr (gevent/gevent.corecext.c:6221)
2019-06-04 04:45:05,982 (facts_serviceagent-1.3.2 8353) <stderr> ERROR: File "gevent/corecext.pyx", line 360, in gevent.corecext.loop.handle_error (gevent/gevent.corecext.c:6397)
2019-06-04 04:45:05,983 (facts_serviceagent-1.3.2 8353) <stderr> ERROR: File "/home/ecorithm/volttron/env/local/lib/python2.7/site-packages/gevent/hub.py", line 565, in handle_error
2019-06-04 04:45:05,984 (facts_serviceagent-1.3.2 8353) <stderr> ERROR: self.handle_system_error(type, value)
2019-06-04 04:45:05,985 (facts_serviceagent-1.3.2 8353) <stderr> ERROR: File "/home/ecorithm/volttron/env/local/lib/python2.7/site-packages/gevent/hub.py", line 580, in handle_system_error
2019-06-04 04:45:05,986 (facts_serviceagent-1.3.2 8353) <stderr> ERROR: self.parent.throw(type, value)
2019-06-04 04:45:05,987 (facts_serviceagent-1.3.2 8353) <stderr> ERROR: error: cannot switch to a different thread
2019-06-04 04:45:05,987 (facts_serviceagent-1.3.2 8353) <stderr> ERROR: (libev) error creating signal/async pipe: Too many open files
2019-06-04 04:51:16,878 () volttron.platform.aip INFO: starting agent /home/ecorithm/.volttron/agents/ab5d9dba-43e8-43b1-b98f-9341e1f8e443/facts_serviceagent-1.3.2
2019-06-04 04:51:16,900 () volttron.platform.aip INFO: agent /home/ecorithm/.volttron/agents/ab5d9dba-43e8-43b1-b98f-9341e1f8e443/facts_serviceagent-1.3.2 has PID 2462
2019-06-04 04:51:16,901 () volttron.platform.control INFO: Successfully restarted agent facts_serviceagent-1.3.2
2019-06-04 04:51:18,006 (facts_serviceagent-1.3.2 2462) <stderr> ERROR: Generating LALR tables
Description of Issue
When HistorianAgent cannot publish data, it tries to send an alert to the health subsystem. This doesn't work due to the BaseHistorianAgent being unable to read the config file. Moreover, it tries to save data to the backup.sqlite database but it can't open it.
Affected Version
develop
Expected
The historian should publish an alert and save data in backup.sqlite database while publishing capability is being resolved.
Actual
The agent crashes and no data is being saved. No notification of it failing is being sent either.
Steps to Reproduce
Prevent the agent from publishing data. It should crash.
Additional Details
NOTE: Running Volttron as a
systemctl
service (the same way as other instances onmaster
branch without issues)volttron.service
Stack trace of the crash after it cannot send data.
Config File