Closed timothykang closed 9 years ago
Upon further investigation, it looks like this issue was introduced by ffde099. Reverting it fixes this issue, though I don't know if it introduces others. Another effect I've noticed is agent logs no longer being printed.
CCing @hashstat @craig8
@timothykang: Have you cleaned up your repository since pulling in that change? There were a couple commits before ffde099 that could cause problems. In particular, bff0b28 split the vip module into multiple modules inside a package. Try removing all the .pyc files within the volttron/project directories and ensure that the vipagent directory no longer exists. From within the project root, do find volttron/platform -name '*.pyc' -delete
and rm -rf volttron/platform/vipagent
.
Please see if the above solution solves the issue. If not, I will look into it further. Thanks.
I performed the cleanup you described and still encountered the issue, and then tried a fresh clone with the same result.
@hashstat let me look into this a little. I will give a report at the standup today.
Sounds like an issue where a non-gevent-aware blocking operation is being performed in gevent context.
@timothykang please verify that this is fixed in master branch.
It seems that this is somehow tied to a mac...further research ongoing.
I somehow discovered that commenting out all of the _log.debug
calls "fixed" the issue, and so believe the issue to be tied to logging. I was able to create the following agent that causes the platform to stop responding after 140-ish calls of doLog
:
import logging
import sys
import os
from volttron.platform.vip.agent import Agent, Core
from volttron.platform.agent import utils
utils.setup_logging()
_log = logging.getLogger(__name__)
def trouble_agent(config_path):
class TroubleAgent(Agent):
counter = 0
@Core.periodic(0.1)
def log(self):
# ping to show activity, since part of the
# issue is log entries not being displayed
self.vip.ping('', str(self.counter))
_log.debug(str(self.counter))
self.counter += 1
identity = os.environ.get('AGENT_UUID')
return TroubleAgent(identity)
def main(argv=sys.argv):
try:
utils.default_main(trouble_agent, description='',
no_pub_sub_socket=True, argv=argv)
except Exception:
_log.exception('unhandled exception')
I haven't found another OS X machine to try it on, so I'm unable to say if this issue is specific to my machine or with OS X in general.
@timothykang and I sat down together to explore this issue. We discovered that it only affects agents enabled to autostart. I was able to reproduce it on both OS-X and Linux. Luckily we found a fix which is to move autostart into the services thread. I expect that there are also some reads/writes from/to file objects which could potentially cause problems lurking in the logging. I will explore that some more.
Steps to reproduce:
At the same time the historian stops logging data, it also stops responding to queries.
Adjusting the value passed to
@Core.periodic
inplatformagent/agent.py
has no effect on the number of rows inserted. Tried intervals of 1, 3, 5, and 15 and the number of rows inserted is always 36.Alternatively, if I disable publishing by the PlatformAgent altogether (i.e. comment out
@Core.periodic
), making 10 to 20 or so queries to the historian will also cause it to stop responding.