VOLTTRON / volttron

VOLTTRON Distributed Control System Platform
https://volttron.readthedocs.io/
Other
455 stars 216 forks source link

Installing agent issue #746

Closed craig8 closed 7 years ago

craig8 commented 7 years ago

This has been duplicated a couple of times. Steps are as follows

  1. Allow volttron to run for an indeterminate time period (i haven't seen it do this in under a day)
  2. Attempt to install an agent on the platform
2016-09-13 11:46:24,150 () volttron.platform.web DEBUG: registered route t is: callable
2016-09-13 11:46:24,407 () volttron.platform.control DEBUG: Received 3176390 bytes of data
2016-09-13 11:46:24,407 () volttron.platform.control DEBUG: done receiving data
2016-09-13 11:46:24,407 () volttron.platform.control DEBUG: Sending done back!
2016-09-13 11:46:24,407 () volttron.platform.control DEBUG: Closing channel on server
2016-09-13 11:46:24,409 () volttron.platform.vip.agent.subsystems.rpc ERROR: unhandled exception in JSON-RPC method 'install_agent': 
Traceback (most recent call last):
  File "/home/volttron/volttron/volttron/platform/vip/agent/subsystems/rpc.py", line 168, in method
    return method(*args, **kwargs)
  File "/home/volttron/volttron/volttron/platform/control.py", line 287, in install_agent
    agent_uuid = self._aip.install_agent(path, vip_identity=vip_identity)
  File "/home/volttron/volttron/volttron/platform/aip.py", line 296, in install_agent
    unpack(agent_wheel, dest=agent_path)
  File "/home/volttron/volttron/env/local/lib/python2.7/site-packages/wheel/tool/__init__.py", line 135, in unpack
    sys.stderr.write("Unpacking to: %s\n" % (destination))
IOError: [Errno 5] Input/output error
craig8 commented 7 years ago

This seems to happen quicker when there are more agents installed. Also, restarting volttron tends to clear up whatever is happening to some degree

Possibly related to #83

craig8 commented 7 years ago

This issue is probably exacerbated by running on boards that may not have fast enough memory processing power. there may be a race condition with the last writing of the file and the starting of the installation process.

rlutes commented 7 years ago

Possibly, problem also occurs on NUC running higher end hardware (i7 processor and 8 GB memory).

craig8 commented 7 years ago

Most likely #756 will fix this for local instances.

craig8 commented 7 years ago

Other work has helped to mitigate the local install issues in merge #759

craig8 commented 7 years ago

This instance has only a listener running with volttron in the background and the master driver installed and ready to go, but nothing else.

volttron-ctl status output is

  AGENT                  IDENTITY            TAG      STATUS
b listeneragent-3.2      listeneragent-3.2_1 listener running [14194]
7 master_driveragent-0.2 platform.driver     master 

Seems that this just won't die. This was written to the volttron.log

2016-10-19 20:13:16,226 () volttron.platform.vip.agent.subsystems.rpc ERROR: unhandled exception in JSON-RPC method 'install_agent_local': 
Traceback (most recent call last):
  File "/home/volttron/volttron/volttron/platform/vip/agent/subsystems/rpc.py", line 168, in method
    return method(*args, **kwargs)
  File "/home/volttron/volttron/volttron/platform/control.py", line 256, in install_agent_local
    add_auth=add_auth)
  File "/home/volttron/volttron/volttron/platform/aip.py", line 305, in install_agent
    unpack(agent_wheel, dest=agent_path)
  File "/home/volttron/volttron/env/local/lib/python2.7/site-packages/wheel/tool/__init__.py", line 134, in unpack
    sys.stderr.write("Unpacking to: %s\n" % (destination))
IOError: [Errno 5] Input/output error

The output from verbose logging of volttron-ctl is as follows

2016-10-19 20:18:17,368 () volttron.platform.vip.agent.core DEBUG: address: ipc://@/home/volttron/.volttron/run/vip.socket
2016-10-19 20:18:17,368 () volttron.platform.vip.agent.core DEBUG: identity: control.connection
2016-10-19 20:18:17,369 () volttron.platform.vip.agent.core DEBUG: agent_uuid: None
2016-10-19 20:18:17,369 () volttron.platform.vip.agent.core DEBUG: severkey: HRsHXFJZOKECVEhBCIMnBLRb8AmkipxTqTReC_5U4zs
2016-10-19 20:18:17,371 () volttron.platform.control INFO: Installing wheel locally without channel subsystem
2016-10-19 20:18:17,519 () volttron.platform.vip.agent.core INFO: Connected to platform: router: a23f408c-c6bc-429b-8835-5cfbd81568fb version: 1.0 identity: control.connection
2016-10-19 20:18:17,520 () volttron.platform.vip.agent.core DEBUG: Running onstart methods.
install: error: [Errno 5] Input/output error

The output from df was

Filesystem     1K-blocks    Used Available Use% Mounted on
udev              761152       0    761152   0% /dev
tmpfs             203872   22040    181832  11% /run
/dev/mmcblk0p2  29937084 2664800  26001688  10% /
tmpfs            1019340       0   1019340   0% /dev/shm
tmpfs               5120       4      5116   1% /run/lock
tmpfs            1019340       0   1019340   0% /sys/fs/cgroup
/dev/mmcblk0p1    130798    7208    123590   6% /media/boot
cgmfs                100       0       100   0% /run/cgmanager/fs
tmpfs             203872       0    203872   0% /run/user/1001
craig8 commented 7 years ago

Output from top (note volttron wasn't anywhere in the top 5 processes)

op - 20:23:43 up 19 days, 11:08,  4 users,  load average: 0.00, 0.01, 0.05
Tasks: 135 total,   1 running, 134 sleeping,   0 stopped,   0 zombie
%Cpu(s):  0.1 us,  0.1 sy,  0.0 ni, 99.8 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem:   2038684 total,   886244 used,  1152440 free,   102544 buffers
KiB Swap:        0 total,        0 used,        0 free.   500660 cached Mem

The main volttron process has only 44 file descriptors using

ls -l /proc/14105/fd |wc -l

The listener process has only 23 file descriptors using

ls -l /proc/14194/fd | wc-l
craig8 commented 7 years ago

I suppose a good thing is that restarting the platform does fix this and if your agents are installed they aren't affected!!!

jhaack commented 7 years ago

Workaround is to redirect output to dev/null. Issue is related to ssh'ing into a box to start VOLTTRON, logging off, then getting back on and trying to install.

kmonson commented 7 years ago

To repoduce: via ssh run volttron in the background (with &), detach it, then close your ssh session, log back in, try to install an agent.

The wheel library writes directly to stderr. If the output is not directed to catch this it produces an IO error when attempting to install an agent.

In practice we can't fix it or it is very difficult as we can't definitively say that when the user does not want to see output from stderr. Also we can't redirect output from stderr and stdout to a logger without first determining if the logger itself is sending its output to stdout. If we didn't allow for "anything you want" style logging configuration it could be done as then we have more control and could know what was going where.

kmonson commented 7 years ago

We'll improve the docs.

jhaack commented 7 years ago

Mentioned in multiple places in documentation now