Handle a halted node differently than a preempted node

krixkrix commented 5 years ago

I am trying to see if I can migrate my ROS SMACH based sequencing engine to BehaviorTree.CPP. However I don't find any concept of preemption here.

In SMACH I am using a lot of concurrent actions where an action A runs in parallel with action B and C. But A is the "master action" and B and C should maximum run as long as A. So when A succeeds, I will preempt B and C, and they will stop their task with a preempted status. But the complete parallel task [A,B,C] will return succeeded. If any of A,B or C fails, the parallel node should fail as always.

Can BehaviourTree.CPP support something like that? How would I know if a preempt request was executed on a subtree or it just failed?

facontidavide commented 5 years ago

Indeed, the logic is that you may halt a running Async node, but not preempt it.

"Preempting" is particularly hard for AsyncActionNode, but quite easy for CoroActionNode; unfortunately, I am pretty sure that no one use the latter, because people are lazy.

I need to understand more about your use case.

What is in your opinion a good definition or "preempting". Can you give me a practical example?
So you want A to me mutually exclusive with B and C, is that right? Can't you use a ReacvtiveFallback or something similar?

krixkrix commented 5 years ago

SMACH action states will in general report an outcome either SUCCEEDED, FAILED or PREEMPTED when the state is exited. And preempted will mean that the action was explicitly cancelled, not necessarily because of a failure. I am using these outcomes heavily in my current code.

Use case: An action A controls a movement from position X1 to X2. An action B controls a light. I want the light to be turned on as long as the movement is active.

The requirements for this scenario are:

If the movement A succeeds, the light action B is cancelled (preempted), and the behavior subtree as a whole succeeds.
If A fails, B is cancelled (preempted), and the behavior subtree as a whole fails.
If B fails, A is cancelled (preempted), and the behavior subtree as a whole fails.

I even want to handle the tricky concurrency cases, where A succeeds, B is requested to cancel, but B reports a failure before it is cancelled, so the behavior subtree as a whole should fail.

In general the action B may be more complicated than turning on a light. But the point is that it is a task that takes some time and may fail.

I am trying to understand how/if a behavior tree can be used for the above. It seems like "Halt" is equivalent to a "preempt" (or "cancel") request, except the behaviour will return "failed" regardless of it has been "halted" or it failed while executing its task.

ramilmsh commented 5 years ago

I realize I am late to the party, but I'd like to propose that this behavior may be modeled with a reactive sequence. Where your "main" action is the last one and everything else is an "async condition". The idea is you have "fast" async nodes ( fast, but not instantaneous, {execution time} >> 1/tick_rate, like turn on lights, store data, check stuff... ) that return running when initiated and success, as soon as an initial check succeeded, but continue executing checks in the background (then, should it fail completely - setStatus( FAILURE ), or if it has been temporarily impeded - setStatus( RUNNING ); both will preempt the main action, but Failure will also fail the whole sequence) and "slow" action in the end, which will be running until it fails, succeeds or is halted by one of the conditions.

@facontidavide, I would like your opinion on this, as this feels like a bit of a crutch.

NOTE: you should probably use CoroActionNode, instead of plain AsyncActionNode, though.

SteveMacenski commented 4 years ago

@facontidavide this is actually something we need in the navigation2 stack as well. The primary use-case is when a navigation command is preempted, we want the controller action node to continue to execute so there's a smooth transition between goals. But the recovery behaviors (e.g. spin, backup, whatever) should be immediately stopped on preemption to make way for the new goal. The planner should also be halted and restarted with the new goal if its still running (functionally this may not be important since planning is typically fast, but for completion's sake).

Then the question comes up with custom control flow / decorator nodes how to treat them in the preemption case. Having some BT node virtual method like halt for preemption would have value so we can implement preemption state updates for all the node types.

For practical considerations, if we have some method we can call, in ROS2 we're running actions so we have a client and a goal so we can async_cancel_goal() the recoveries, just pass for the controller, and probably mess with the internal state of control flow nodes. The coro action nodes help to kill the coroutine, but if we have a handle to the server, we can do that through ROS2 interfaces for the remote server. The control flow nodes could have a flag set and next time its ticked we return some value set by it and bypass the control flow logic from preemption. Or if its possible in the lower level of the tree to set a status and just reset the control flow node to some state post-preemption without having to wait for another tick.

facontidavide commented 4 years ago

Hi @SteveMacenski,

I was waiting for your message with a sense of dread and imminent doom, since I saw the discussion on the Github of Navigation2...

So, let's see if I understand this right or not:

when a navigation command is preempted, we want the controller action node to continue to execute so there's a smooth transition between goals. But the recovery behaviors (e.g. spin, backup, whatever) should be immediately stopped on preemption to make way for the new goal.

At the level of the entire tree, where is the preemption signal inserted? The root?
Do you want this preemption signal to be propagated to the entire tree?
How is decided if a particular Action halt() itself when a preemption signal is received? Can we create a DecoratorNode that converts a preemption signal into a halting signal?

Then the question comes up with custom control flow / decorator nodes how to treat them in the preemption case ... so we can implement preemption state updates for all the node types.

This is indeed what I am terribly worried about, I think you underestimate the combinatory explosion of potential bugs and corner cases (but maybe I am wrong).

The control flow nodes could have a flag set and next time its ticked we return some value set by it and bypass the control flow logic from preemption.

This is what I would call "your implementation details".

Or if its possible in the lower level of the tree to set a status and just reset the control flow node to some state post-preemption without having to wait for another tick.

As long you don't ask me to add NodeStatus::PREEMPTED or similar... that can be done.

The only way to have a sane discussion about this is to implement some (dummy code / unit test) without any ROS/ROS2 to validate the concept, API, usability and potential corner cases.

SteveMacenski commented 4 years ago

I was waiting for your message with a sense of dread and imminent doom, since I saw the discussion on the Github of Navigation2...

Oops. I never mean to be the harbinger of doom (or do I....).

Before going into the bullets, lets just talk thematically. I hate getting bogged in technical details before getting clear idea of what's happening. Also the answers to these questions are fluid since we're looking for a certain result but I believe there are many ways of accomplishing it.

We have a robot going around a (I don't know, what are you working on these days...) parking lot. While searching for a spot to park, a new one opens up right in front of you and you would like to preempt the navigation request to the other, less optimal parking spot 2 rows over you were going to.

When I preempt, just like in ROS1 navigation, I really don't want to disrupt the flow of the robot (e.g. stop the robot dead in its tracks to then replan and go to the goal). So, I believe all I want to keep moving is the controller, at least for some short duration. The planner, if planning, and recoveries, if recovering, I want to stop like its a cancel command to call anew.

Now, because we're not using a state machine but the BT, there's the added complexity of the control flow nodes (and custom ones) about how to manage the state of these things if we're not fully resetting the tree (or are we?) but the controller should still spin on its last command until a new one comes in.

Another open question is what to do if there are multiple controllers in the tree. One's still spinning but if the preemption comes in and the conditions are right, the other controller could be called and now you have 2 controllers fighting to send cmd_vels. Lets ignore that one for right now (unless you have a solution). My solution is "don't do that" or if you do that, make them both in the same controller_server and I have logic in there to make sure only one runs at a time (old one is cancelled). Either way, not your or BT.CPP's problem.

So onto the details.

At the level of the entire tree, where is the preemption signal inserted? The root?

Sure, lets say so. Right now, the action server that exposes the BT has a preemption handler that will set a new goal for the planner to use and that's it. The planner spins at 1hz so the next time around it'll use the new goal and make a path, then the controller will take the new path on the blackboard and follow it.

https://github.com/ros-planning/navigation2/blob/master/nav2_bt_navigator/src/bt_navigator.cpp#L220

I think what makes most sense is for the action server exposing the BT that gets the new navigation request should have some preempt analog to halt and methods in the implementations of nodes handle it in some way across the tree (e.g. planner triggers immediate replan, controller does nothing, recoveries immediately exit, control flow and decorator nodes do.... something?).

Do you want this preemption signal to be propagated to the entire tree?

I believe so, so that when preemption for navigation task happens, we can tell the planner to replan and the recoveries to stop. What triggered this work is that when a new navigation goal comes in from preemption, the recoveries (which can be long running like spinning or waiting) don't end blocking the new request until complete. This is not great. I assume the way we need to do this is propogating through the tree. Open for alternatives though.

How is decided if a particular Action halt() itself when a preemption signal is received? Can we create a DecoratorNode that converts a preemption signal into a halting signal?

Not sure I understand this point. Are you asking how a halt() could know if it was called due to a preemption? I'm not sure it could. I think there would need to be another method other than halt(). My guess is for many nodes (and by default) the preemption method would just halt. With the exception of the controller. For non-Action nodes, I think there would need to be some state updated in this preemption method. For the goal of the preempted recovery X returning FAILURE that another control flow recovery Y isn't then triggered and returns up the tree. Unless we can just set the state to idle so when this exits it won't tick another child. I'm not sure under the hood of BT.CPP how that is handled.

This is indeed what I am terribly worried about, I think you underestimate the combinatory explosion of potential bugs and corner cases (but maybe I am wrong).

Oh, no. I agree with you. This sounds like a total complete pain. I think the only way this doesn't become a total pain is if in the preemption we can set the state to IDLE or something and if the logic in all the control flow nodes recognizes that if its IDLE and returned from a preemption child it doesn't keep on going and just returns some status. Or optionally, the recursive visitor (by the way, great idea, lambdas make the idea of visitors in C++ way more bearable, its used alot in open_karto for finding neighbors and its way less clear there the design) has an option to apply from leaf -> root rather than root -> leaf such that the leaf action nodes have their preempt() called, then their control flow nodes have their preempt() called (which could set a flag like IDLE) and so forth. This way there's no opportunity for it to be reticked by the next one because its been handled in order. The open question for that is: what should be the state of the control flow node of the still-running processes after the preemption? E.g. the controller BT node, should its control flow node know anything about its current state? Or should it just be IDLE and the tree is none the wiser that there's still nodes' servers still processing. Alternatively as the on_preempt() of still running children, we keep the state the same?

I know that was a mouthful.

I don't necessarily think that a new status type is required. Honestly, that one didnt even cross my mind but I think that could help with some of the flag setting on the blackboard ;-). I would be interested in your thoughts if this is even the right way to think about it. This seems like an important use-case. While BT.CPP isn't ROS related, scrolling through the stars and forks, there's an awful lot of known ROS users...

Huh, that gets me thinking, maybe some clever use of the blackboard could help... it would definitely be hacky...

PS thanks for https://github.com/PointCloudLibrary/pcl/pull/3853/files. I would really love to do more work in PCL, but its super unclear to me who the players are. The state of the website and forums makes me think that PCL isn't being as used as much anymore outside of ROS-land (what are they using instead? Spinning their own?) and I was never sure how to "jump in" to that community. There's also a huge number of outstanding tickets and PRs that dont seem to be moving. I would really love if an organization took this over and brought the project back to life. Its a pretty critical project in my mind.

Also generally thanks for making this project too. Website, tutorials, clean code, I can't imagine convincing someone to give me the time to make something like this mostly by myself. I can usually swing it when its a higher-level application thing like navigation. Thanks a ton!

ruffsl commented 4 years ago

I would really love to do more work in PCL, but its super unclear to me who the players are.

Might want to reach out to the organizers behind the GSoC for PCL to lean about the maintainers:

The chat link points to the PCL Discord channel where I guess the community is congregates.

facontidavide commented 4 years ago

Wow, this will take a while to answer. Spoiler alert, we need dummy code to play around with (non ROS, just a printf in the callbacks).

Rather than answering all your questions at once, we need to approach this step by step!

Reading the description of your use case (and not thinking about navigation2 at all), I would do something like this

As you may see, at each tick (no problem running at 100 Hz) we are checking if there is a new goal.

Let's see al the cases:

First command: planning is executed and path follower (controller) started
Following loops: we check if there is a new path, since there is not, we jump to follower.
if a new goal is sent, there is a replanning, but follower is never stopped
note: if plan_trajectory fails, the entire branch fails

Isn't it what you had in mind ? As you can see, no need to introduce preemption, the grammar of reactive nodes cover this use case I dare to say "elegantly" :stuck_out_tongue:

SteveMacenski commented 4 years ago

This wouldn't allow for replanning of the same goal over time (e.g. every 1 s replan, or on a controller failure replan, or I suppose every N meters replan, though that's a new one). though I agree with your philosophy of "more BT primitive nodes, less blackboard background logic".

facontidavide commented 4 years ago

Ok, I missed the 1 Hz thing here. Easy-peasy:

Note that all the subtree with the green nodes on the left can be just a single node. Fine-grained Conditions/Node are good for reusability, but there maybe you don't care about that.

As you can see, there is always a way to express the logic.

I probably do something like this:

The node isReplanNeeded can check whatever you want, new goal, time elapsed, etc...

if_first_than_second can be implemented in 1 hour.

As you can see, Reactive Nodes is what you should consider for... well, reactive behaviours.

SteveMacenski commented 4 years ago

Sure, but I think we're missing the point a little, the planner isn't really a huge concern to me (though this does help, @crdelsey how do those trees look to you for a replacement?). The bigger issue is preempting running nodes. This is a good example of how to call a node. I need to be able to preempt that PlanTrajectory node if it takes, say, 10 seconds to run and a new request came in to preempt it 2 seconds into planning. Also we have recoveries for the planner too "contextual recoveries", but I think that's just implementing a control flow that keeps marching until success (so isReplanningNeeded would be inverted).

What actually triggered this work was recoveries, which are long running, and preemption. If my robot is waiting 5 seconds or spinning 360 or whatever and a preemption comes in to go to another position, I need that long running node stopped and returned to start planning / navigating / going through the tree to the new goal. However, I need the controller to continue doing what its doing so there's a smooth transition between task A and task B.

I essentially want to halt(), except some nodes, or set the state of running things. That's why we proposed an analog to halt() for preempt() so that each node can handle what it should do on a preemption request (planner, recoveries halt, controller continues, control flow... reset? exit? not sure there).

Can that be modeled through the BT?

facontidavide commented 4 years ago

So, supposing you want premption callback, no matter what :stuck_out_tongue: ...

My suggestion then is to add the preempt() callback to your nodes (virtual method) and use the "applyRecursiveVisitor" to propagate the preemption "signal".

Then it is your responsibility to decide if this preempt() callback also triggers a state change (SUCCESS, FAILURE, or simply resend the request with action client) in the NEXT tick().

You have to keep in mind two things:

how the preempt affect a particular node (this can happen synchronously when preempt signal is propagated)
how the preempt affect the execution of the entire tree (this will be asynchronous, because NodeStatus change will be visible the next time tick() is called).

Since there is no way I can standardize this for you, because no use case is the same, you need to address this yourself.

I hope this helps...

SteveMacenski commented 4 years ago

Having a preempt method in the tree node itself in BT.CPP would significantly simplify the implementation. I think from the original ticket and our issues, this is going to be a common theme. Would it not be possible to have a preempt() added to the tree node class like halt()? Dynamic casting around pointers is harsh and when applying the visitor we don't necessarily know what the type of the node is. Further, if we create our own base classes to work with, it immediately invalidates the our behavior tree library as being generic and allowing for developers to bring their own BT nodes to the party without having to derive from our base classes to work.

Is that right @gimait?

I don't like the idea of having these weird state overrides, but I'm not sure what else to do to stop a running process in the behavior tree without killing the entire behavior tree. That seems to be a genuine issue.

The next best thing I can guess is to have some preempted blackboard variable that if set to true, all running nodes stop next time they're ticked. Similar implementation challenges.

facontidavide commented 4 years ago

Let's make it clear: it is not like a I don't understand the benefits of the preemption concept.

I genuinely don't know how to semantically describe this.

Would it not be possible to have a preempt() added to the tree node class like halt()?

Adding a virtual preempt() function to the TreeNode is easy.

halt() has a very clear semantic right now: for instance, a ControlNode must halt all its running children when halted.

What is really, really difficult (and no one really is thinking this through) is:

Where does the preemption signal come from?
How is it propagated through ControlNode and DecoratorNodes? Are they preemptable themselves?
Once a node is preempted what should its parent (ControlNode or DecoratorNode) do?

There are a lot of questions like these that need to be answered. I am willing to, but I am honestly unable to answer them right now.

Any concrete implementation proposal (and I mean a PR with unit tests included) that does not break the existing semantic, but only extends it, is welcome.

SteveMacenski commented 4 years ago

Let's make it clear: it is not like a I don't understand the benefits of the preemption concept.

Yes, sorry, didn't mean to imply otherwise.

The preemption is a special case for the controller in nav. I'm wondering if the "right-ish" answer is to have a preempt method for each BT node that defaults to halt. And in the controller BT node, we override the halt with just nothing. An open question I need to think about is when applying the visitor, will the control flow's halt tell the children to halt, I don't want halt called recursively, I want preempt called recursively that will be halt for most cases. Ex. ControlFlowA has preempt called, which is set to call halt. I want ControlFlowA to halt, but I don't want it to tell the children anything, I want to recursive visitor to do that.

The remote controller server would just continue running because it doesn't know better. When the parent control flow node gets to this secretly still running node, I think* it should be fine because when the control flow node looks at the state of the child node, it will already be running. It would put the control flow node above the controller in a wacky state since it doesn't know this is still running, but from the "clean start" perspective of a preemption, that seems like desired behavior potentially. There's a box of worms to be opened if the controller fails during that narrow window and exits with failure, I'm sure.

SteveMacenski commented 4 years ago

Please read through this comment for cross referencing: https://github.com/ros-planning/navigation2/issues/1622#issuecomment-624965596

I think this could in concept solve some of our problems. I think it puts the burden of preemptable processes on the BT designer rather than in the nodes themselves. It will do the job (I think?) but makes the behavior trees extremely sensitive to modify.

Thanks for the node drawings you provided. I don't think I would have gotten to that conclusion without them. That's probably the way to go (use the NewGoal BT as both a trigger to start planning and oppositely as a trigger to kill long running actions)

gimait commented 4 years ago

Is that right @gimait?

Yes, that's right :)

After exploring the different options that we have using the current bt architecture, I think what would make most sense for us would be to have a callable method in the nodes that require specific actions when a global navigation goal is updated (which for the bt would mean just a variable in the blackboard).

I agree with you, @facontidavide, that it would be possible to use the current implementation to obtain the complex behaviour we are looking to have here. However, as you have shown in your examples, even for the easy implementation of stopping one subtree only requires including extra nodes that in my opinion would make harder a clear design of the bt.

We are mainly looking for a clear and clean way of notifying the whole bt that a new global goal is sent in an asynchronous way, and after some discussions, we ended up here looking for some help. Since the only way of traversing the bt is using visitors, the best solution I could think that doesn't require changes in this library is to subclass all nodes that need a specific behaviour and use dynamic casting in the visitor function to check if a node is an implementation of our nodes. By doing this we can write the extra calls we need for navigation, however, if there was another available method on the base class that can be called without casting, this would be much clearer. That is why we are asking about having a 'preemption' method on the node class.

Another thing that we were missing when we started looking at this is that is not just the action nodes that could need a specific behaviour. To make this work, we would also need (at least) control nodes to be able to react to the preemption. That way, we can create control nodes that perform a specific action (halt a child or a bunch of them, restart a child, etc).

I think what I have been trying to do is to add some extra complexity to the tree nodes to be able to generate more complex behaviours, while keeping the behaviour tree cleaner, but maybe it would be better to go with the current node implementation, and create a bit more complex tree.

What do you think @facontidavide? Is it best if we just leave everything in the design of the bt itself?

SteveMacenski commented 4 years ago

I think the 2 ways through this are:

what I stated above with adding new BT nodes for the goal (which I think is the way to go; less code, less changes for mistakes; though less general and will require some explanation)
adding a preempt() to the base node and then in Nav2 we figure things out under the hood (control flow state nightmare; hides from user; hopefully general for all long running processes in exchange for complex control flow node behavior).

@gimait did you read through that new comment I posted? Do you still think the best route is the preempt() methods? I think we can get everything we want from a couple of new BT nodes based on Davide's suggestions above. It trades generality for flexibility and simplicity. I think we could struggle through trying to make preempt() methods for all the Nav2 BT nodes, but I think this is quicker and probably more clean to maintain and more in the spirit of behavior trees.

gimait commented 4 years ago

Yes maybe it's best to play more with the behaviour tree structure to get the behaviours we want. Let's try that and if there is issues with it down the road we know about some other options =)

SteveMacenski commented 4 years ago

Awesome, this tool can help too to sanity check trees as changes are made https://github.com/ros-planning/navigation2/blob/master/tools/bt2img.py

BehaviorTree / BehaviorTree.CPP

Handle a halted node differently than a preempted node #106