Open samouwow opened 4 months ago
Thank you for the awesome description. I will take a look later at the weekend.
Thanks for the quick merging of that first PR.
Main still isn't building on my machine, is it building for you?
It is building on my machines (I checked on Linux, Mac and Windows). The problems were on the Windows machine but now it works. Can you describe a problem?
I've opened an issue with the cargo output of the failing build to avoid polluting this issue.
Please see #68
Kudos on the work you've done here, this library has heaps of really useful features.
I've run into a pretty major roadblock that I've traced back to how the library handles RNodeState::Running during the ticking of both decorator and flow node types, and ultimately leaf nodes too.
To the best of my investigation abilities, the root causes seem to boil down to:
The below sections outline some of the specific issues these present
I'm imaging this issue could be used to track a number of PRs, which I'll pre-emptively summarise here:
67
69
Not yet address by PR
Decorators
Timeout
Timeout recalculates the
timeout
system clock stamp every timeprepare()
is called, which means unless the timeout is 0 or 1, so that it fails immediately (i.e.curr - start >= timeout
is true), it will never fail.This can be seen in the below example tree.
success()
should be called twice, but instead is called 5 times, as timeout never expires.I'd propose timeout considered its return value for the previous tick, and if that was
RUNNING
it didn't recalculate the timeout system clock stamp. The exception to this would also be to reset the timestamp if it was interrupted, see the section on reactive flow below.On an unrelated note, the documentation for timeout says it uses milliseconds, but it actually uses seconds. My personal vote would be to update it to use milliseconds, since seconds are a little coarse.
Delay
Delay works during the
prepare()
function as well, but doesn't consider its previous value, so it runs every time it's visited.This can be seen in the below example, where the tree should take about 1 second to complete, but instead takes 5.
I'd propose that delay doesn't do anything in
prepare()
if it previously returned RUNNING. It should also reset if halted, see below.Additionally, delay is implemented with a thread sleep, which blocks reactive sequences and fallbacks from interrupting it. I don't have an example for this, but it should be pretty straight forwards to recreate.
I'd propose delay recorded a timestamp like timeout and didn't pass on to its child until that timeout was reached. It could return running in the meantime.
Reset and Retry
Reset and retry both have a similar issue, wherein they don't explicitly set their attempt counts to 0. (They also seem to use
len
as their attempt counter for 1, which seems to accidentally work since len is always 1). This means that a retry will only work as expected the first time. If you retry-a-retry, every cycle after the first will only execute one loop.For example, the below loop should call fail_empty() 9 times, but actually only does it 5 times.
Behind the scenes, this is because you're seeing:
I'd propose retry and reset explicitly set their counters to 0 during
prepare()
, but only if they didn't return RUNNING last tick. This should also be reset if they're halted.Flow
Sequences and Fallbacks
Sequences and fallbacks never reset their P_CURSOR, so a repeat of a sequence that had children that returned running will forever restart from that running child.
The below should call success 8 times, or 6 times accounting for the reset-repeat bug mentioned above. However, it only calls success 5 times, since the sequence restarts on the last child, as P_CURSOR is never reset. Note the
repeat(2) success()
is used to generate aRUNNING
status to trigger sequence to set its P_CURSOR.I'd propose the P_CURSOR gets reset if sequence or fallback return a non-running result. That way you'd enter them "fresh" if you try to repeat or retry them. They should also reset their P_CURSOR if halted, see below.
M_Sequence and Parallel
I haven't investigated these, but since I can't find anywhere that explicitly sets P_CURSOR to 0 I assume these guys have similar problems.
Halting running nodes on an interrupt
As touched on above, all of these nodes will behave unexpectedly if they're interrupted by a reactive sequence or fallback node.
This is hard to demo with the above bugs present, but consider the below tree:
By my reasoning, success() should be called three times, but would only be called 2. This is because repeat was never notified that it was interrupted by the reactive sequence, so on the second retry it picks up from where it left off, rather than restarting its repeat count from 2.
The solution to this seems pretty involved. All nodes need the concept of being halted, so that if they're currently
RUNNING
they tidy themselves up neatly and start properly next time. Behavior Tree CPP has this concept, so hopefully it could be used for inspiration on how to implement this.Leaf / user nodes
This concept of halting should be extended to user implemented nodes, so that their logic can clean themselves up if they too are interrupted.
I haven't thought too much about how to implement this, potentially via a new
halt
function next totick
in theimpl
trait. This could have a default implementation so that existing code / synchronous nodes didn't need to worry about it.