WPIRoboticsProjects / GRIP

Program for rapidly developing computer vision applications
http://wpiroboticsprojects.github.io/GRIP
Other
372 stars 108 forks source link

GRIP crashes on RoboRio a few minutes after launch #515

Closed t14916 closed 8 years ago

t14916 commented 8 years ago

As the title says, GRIP is crashing for some reason a few minutes after launch. We suspected that it is some sort of memory leak problem, but haven't been able to progress any further. The GRIP usually stops updating, and stops sending an image to the SmartDashboard plugin. This is what shows up in Riolog before the crash:

#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0xaf336398, pid=1604, tid=2945889376
#
# JRE version: Java(TM) SE Embedded Runtime Environment (8.0_06-b23) (build 1.8.0_06-b23)
# Java VM: Java HotSpot(TM) Embedded Client VM (25.6-b23 mixed mode linux-arm )
# Problematic frame:
# C  [libopencv_imgcodecs.so.3.0+0x5f398]  jpeg_fill_bit_buffer+0x70
#
# Core dump written. Default location: //core or core.1604 (max size 2048 kB). To ensure a full core dump, try "ulimit -c unlimited" before starting Java again
#
# An error report file with more information is saved as:
 # /tmp/hs_err_pid1604.log
#
# If you would like to submit a bug report, please visit: 
#
 /home/lvuser/grip: line 4:  1604 Aborted                 (core dumped) /usr/local/frc/JRE//bin/java -Xmx50m - XX:-OmitStackTraceInFastThrow -XX:+HeapDumpOnOutOfMemoryError -jar '/home/lvuser/grip.jar' '/home/lvuser/project.grip'

Had to remove some # 's which were causing some of the lines to be bolded. I don't know if that makes any difference but all the info is here.

JLLeitschuh commented 8 years ago

Can you upload/copy the contents of /tmp/hs_err_pid1604.log here?

rlee287 commented 8 years ago

This happened to me as well, sporadically. The roboRIO has a limited amount of processing power. If the GRIP program is trying to use too much resources, it will crash as described above. I managed to make it go away by reducing the computational power that our project.grip requires.

Also, if you are building from master, you can use NetworkTables to enable GRIP only when necessary (see #501 and #505).

t14916 commented 8 years ago

We switched Roborios and its fine now I'm not sure what was happening.

rlee287 commented 8 years ago

This problem is very finicky and hard to reproduce. When I was troubleshooting this at first, I traced the problem down to a java import in our robot code (!). This was even more startling because the relevant import pointed to a completely unrelated java enum. After editing the project.grip file as described above, all problems vanished, and the import no longer caused problems.

Therefore, I think this is a bug either in the way the JNI is called in GRIP and WPIlib or in the JVM itself.

To save resources, people building from master can look at #501 and #505 to turn GRIP on only when needed.

t14916 commented 8 years ago

Yeah the problem just started again with the new Roborio, not sure what caused it. What edits did you make to the GRIP pipeline to reduce computational power? Right now all our GRIP pipeline does is use a RGB threshold, find contours, filter contours, and publish the contour report. We've tried re-sizing it but it doesn't fix the problem.

rlee287 commented 8 years ago

Could you upload your project.grip as well as a sample image I can use to test with? When I have a chance I will look at it.

EDIT: How are you starting GRIP in your robot code?

t14916 commented 8 years ago

We are starting GRIP with: void VisionEngine::StartGRIP() { /* Run GRIP in a new process */ if (pid == -1) { pid = fork();

    if (pid == 0)
    {
        std::cout << "GRIP has begun operation" <<std::endl;
        system("/home/lvuser/grip &");

    }
}

}

It's in RobotInit, so this starts GRIP as soon as the RoboRio is turned on. Also, here is the .grip file we use: https://www.dropbox.com/s/e05kx7e2roqlyxf/NewGripVersion.grip?dl=0 I don't have a sample image available right now, but I can get one tomorrow. Thanks for your help.

t14916 commented 8 years ago
screen shot 2016-02-29 at 5 44 14 pm

That's the sample image of our target setup

https://www.dropbox.com/s/5pdrot64z8600wi/hs_err_pid1565.log?dl=0 This is the dropbox to the error log that we get from this error.

rlee287 commented 8 years ago

How big is the picture? Is it 1:1 in your desktop screenshot? Anyway, I will edit this post once I look at your project. EDIT: See below post.

t14916 commented 8 years ago

Yeah its 1:1. We don't resize the image, didn't help with the memory when we tried it.

rlee287 commented 8 years ago

Took a look; the problem is in the GRIP project. After troubleshooting this exact same problem, I discovered that CPU usage, rather than memory usage, is the issue. After cropping the screenshot to show only the camera feed, I got a picture size of 1281x960 (which I am rounding off to 1280x960):

camera feed

This is larger than the roboRIO's CPU can handle. The setup tool (mentioned in the "Axis Camera Setup Tool" section in Configuring an Axis Camera) sets up the video feed to use 320x240 at 15 fps. Even at this setting, CPU usage is at 100%, and a 1280x960 feed would require 16 times the computational power.

320x240 is big enough for FRC use. Even with these settings, very complex GRIP pipelines will still segfault, but carefully choosing only the necessary operations should reduce the CPU load enough to prevent problems.

If you really want to use a large 1280x960 feed, you will need to do offboard processing.

t14916 commented 8 years ago

Yeah we got an offboard processor to run it now. May also lower the resolution. Thanks for the help.

ryannazaretian commented 8 years ago

I'm having the same issue on the Raspberry Pi. Tried with both GRIP 1.2 and 1.3. (SIGSEGV with jpeg_fill_bit_buffer in libopencv_imgcodecs.so.3.0)

hs_err_pid958.log.txt

project.grip.txt

Here's a link to the setup script on how I setup the Raspberry Pi environment. https://github.com/GarnetSquardon4901/rpi-vision-processing