OpenAgricultureFoundation / openag_brain

ROS package for controlling an OpenAg food computer
GNU General Public License v3.0
221 stars 68 forks source link

Respawn usb_cam_node after crash caused by usb port reconnection improve get_topic_data message and error handling #339

Open goruck opened 7 years ago

goruck commented 7 years ago

The usb_cam_node driver crashes frequently because the USB port loses connection with the camera and the driver tries to read from it before the port gets automatically reconnected by the Raspberry Pi's OS.

You'll see an error like the following:

TRACE> arduino_handler serial read: >0,93.80,26.80,452,30.50,0,0,3.88,0.08<
[swscaler @ 0x11f3860] No accelerated colorspace conversion found from yuv422p to rgb24.
[ INFO] [1503837690.772792494]: Saved image /var/www/html/img.jpg
[ERROR] [1503837690.774624353]: Videre INI format can only save calibrations using the plumb bob distortion model. Use the YAML format instead.
    distortion_model = '', expected 'plumb_bob'
    D.size() = 0, expected 5
[ERROR] [1503837691.449486717]: VIDIOC_DQBUF error 19, No such device
[DEBUG] [WallTime: 1503837691.491640] 
TRACE> arduino_handler serial write 89 bytes: >0,0.0,0.0,False,False,False,False,False,False,0.5,True,True,True,0.0,0.0,0.0,False,False<
[WARN] [WallTime: 1503837691.493645] (5, 'Input/output error')
[WARN] [WallTime: 1503837691.503485] No serial device found on system in /dev/serial/by-id
[WARN] [WallTime: 1503837691.707030] No serial device found on system in /dev/serial/by-id
[WARN] [WallTime: 1503837691.908940] No serial device found on system in /dev/serial/by-id
[environments/environment_1/aerial_image-18] process has died [pid 8221, exit code 1, cmd /home/pi/catkin_ws/devel/lib/usb_cam/usb_cam_node __name:=aerial_image __log:=/home/pi/.ros/log/cb12cffa-8b1f-11e7-92a4-b827eb6991c6/environments-environment_1-aerial_image-18.log].
log file: /home/pi/.ros/log/cb12cffa-8b1f-11e7-92a4-b827eb6991c6/environments-environment_1-aerial_image-18*.log
[WARN] [WallTime: 1503837692.110899] No serial device found on system in /dev/serial/by-id
[WARN] [WallTime: 1503837692.312769] No serial device found on system in /dev/serial/by-id
[WARN] [WallTime: 1503837692.514701] No serial device found on system in /dev/serial/by-id

Note that the USB port automatically gets reconnected but not before the driver attempts a read which causes it to crash.

The errors above are generated in the usb_cam ros driver here:

case IO_METHOD_MMAP:
      CLEAR(buf);

      buf.type = V4L2_BUF_TYPE_VIDEO_CAPTURE;
      buf.memory = V4L2_MEMORY_MMAP;

      if (-1 == xioctl(fd_, VIDIOC_DQBUF, &buf))
      {
        switch (errno)
        {
          case EAGAIN:
            return 0;

          case EIO:
            /* Could ignore EIO, see spec. */

            /* fall through */

          default:
            errno_exit("VIDIOC_DQBUF");
        }
      }

which caused the process to be killed:

static void errno_exit(const char * s)
{
  ROS_ERROR("%s error %d, %s", s, errno, strerror(errno));
  exit(EXIT_FAILURE);
}

The simple solution is to specify the respawn = "true" attribute for the usb_cam_node in the launch file so that if it crashes ROS will automatically restart it.

This seems to work well as evidenced by the roslaunch log file:

[roslaunch][ERROR] 2017-09-02 12:55:32,524: [environments/environment_1/aerial_image-18] process has died [pid 20145, exit code 1, cmd /home/pi/catkin_ws/devel/lib/usb_cam/usb_cam_node __name:=aerial_image __log:=/home/pi/.ros/log/5b2f4e38-9011-11e7-b929-b827eb6991c6/environments-environment_1-aerial_image-18.log].
log file: /home/pi/.ros/log/5b2f4e38-9011-11e7-b929-b827eb6991c6/environments-environment_1-aerial_image-18*.log
[roslaunch][INFO] 2017-09-02 12:55:32,526: [environments/environment_1/aerial_image-18] restarting process
[roslaunch][INFO] 2017-09-02 12:55:32,527: process[environments/environment_1/aerial_image-18]: restarting os process
[roslaunch][INFO] 2017-09-02 12:55:32,528: process[environments/environment_1/aerial_image-18]: start w/ args [[u'/home/pi/catkin_ws/devel/lib/usb_cam/usb_cam_node', u'__name:=aerial_image', u'__log:=/home/pi/.ros/log/5b2f4e38-9011-11e7-b929-b827eb6991c6/environments-environment_1-aerial_image-18.log']]
[roslaunch][INFO] 2017-09-02 12:55:32,529: process[environments/environment_1/aerial_image-18]: cwd will be [/home/pi/.ros]
[roslaunch][INFO] 2017-09-02 12:55:32,547: process[environments/environment_1/aerial_image-18]: started with pid [20212]

It's not clear why the USB ports disconnect in the first place but since they get automatically reconnected this will mitigate the crash of the USB camera driver. We should still seek the root cause of the Raspberry Pi's USB ports in this regard.

goruck commented 7 years ago

Needed to add "respawn_delay = 30" attribute for the usb_cam_node in the launch file since the reconnection may fail if attempted too soon after a crash. With this the camera always reconnects for me and as a added bonus it stays on the same port so the top camera remains mapped to /dev/video0 assuming its physically plugged into the right USB port on the Raspberry Pi.

goruck commented 6 years ago

The get_topic_data function in nodes/api.py wasn't really useful beyond specific messages. I've rewritten it to be general and more robust. I've added this changes to this PR. Note that get_topic_data depends on rospy_message_converter so I added that to scripts/generate_rosinstall. I'll also open an issue and link it to this PR just in case someone tries to use it and runs into problems.