clubcapra / takin

:goat: Capra-Takin is a ROS-based solution for managing and operating Club Capra's rescue robot. :sheep:
GNU General Public License v3.0
6 stars 3 forks source link

Write a fail-safe implementation #72

Open lvanasse opened 5 years ago

lvanasse commented 5 years ago

Set e-stop pin value to 0 if node is killed or if node dies. The current behaviour lets the pin at 1 even when the node is killed or dies.

Might be worth to have a look at this package http://wiki.ros.org/bond_core

lvanasse commented 5 years ago

Although, @LazyEngineerToBe , correct me if I am wrong. But on the software side, if we don't have the takin_estop node alive the takin_motors node won't work? This is not to dismiss the necessity of having the fail-safe implementation, just as an information verification.

Edit :

I want to know, what should we do in the event of a power outage on the robot. I don't know if the GPIO pin would reset it's value and then set the estop to false and prevent from running the motors.

lit-af commented 5 years ago

Right now, there's no software heartbeat between takin_estop and takin_motors. Therefore, if the estop node fails, the motors are still enabled.

In the case of a power outage at the Jetson, on reboot, the pin will be set back to its default value.

lvanasse commented 5 years ago

Ok, I'll add a check for that. But can you think of a scenario where only the takin_estop node would be kill and not the other? I just want to understand your concern here.

Also if I understand correctly, what we could do is bond the takin_estop and takin_motors together. So in the case that takin_estop fail, the motors would know and refuse to work? I am getting that right?

lit-af commented 5 years ago

Ok, I'll add a check for that. But can you think of a scenario where only the takin_estop node would be kill and not the other? I just want to understand your concern here.

The two things I can think of are the following:

  1. Execution error.
  2. Node execution incomplete when the node is killed.

Also if I understand correctly, what we could do is bond the takin_estop and takin_motors together. So in the case that takin_estop fail, the motors would know and refuse to work? I am getting that right?

This could work, although a standalone solution would be best. Mostly because takin_motors will eventually be replaced by ros_control.

lvanasse commented 5 years ago

The two things I can think of are the following:

1. Execution error.

When you say, execution error, do you mean something like missing permission to the pin, so we cannot write to them? Or something like memory corruption inside the code that would result I the node dying? (If it is the latter, we should consider using memory safe language, I've been reading a lot about Rust these days and would like small project to get my feet wet with it.)

  1. Node execution incomplete when the node is killed. Correct me if I am wrong, but do you mean that if the node is kill while we want to press the estop it might not define the pin value? But if the node is kill, wouldn't that be by power outage? I just want to makes it as clear as possible so we have something secure for the estop.

Also if I understand correctly, what we could do is bond the takin_estop and takin_motors together. So in the case that takin_estop fail, the motors would know and refuse to work? I am getting that right?

This could work, although a standalone solution would be best. Mostly because takin_motors will eventually be replaced by ros_control.

Okay, so something like a watch dog that'll communicate between the two. So would bond_core would do it, is that what you mean? Also standalone does as trade off also, why do you think it would be best?

lit-af commented 5 years ago

Correct me if I am wrong, but do you mean that if the node is kill while we want to press the estop it might not define the pin value? But if the node is kill, wouldn't that be by power outage? I just want to makes it as clear as possible so we have something secure for the estop.

Not sure what you're talking about, We should probably discuss this in person.

lit-af commented 5 years ago

Found a pretty good explanation of what a fail-safe system should be. http://www.codingthearchitecture.com/2010/03/23/fail_safe.html

Might help you figure out the idea behind this issue I opened.