Open tiko5000 opened 1 year ago
I just pushed a major update to flexbe ros2-devel before I saw this. I'll try to verify this test tomorrow, but in the meantime, I'd love for you to give the new version a try. Check there change logs as there are significant changes. The new version seems more stable, maintains sync better, and uses less CPU resources.
Thanks for the notice. I gave it a try but still encounter some unexpected behavior. Here are my Testcases:
Log:
[7:09:33 AM] Onboard engine just started.
[7:09:43 AM] --> Mirror - received updated structure
[7:09:43 AM] --> Preparing new behavior...
[7:09:43 AM] Received a new mirror structure for checksum 1383782741
[7:09:43 AM] BE Starting [Concurrency_Test : 1383782741]
[7:09:43 AM] A
[7:09:43 AM] B
[7:09:43 AM] ConcurrencyContainer Container returning outcome finished (request inner sync)
[7:09:43 AM] Behavior execution for Concurrency_Test: 1383782741 failed! [-]
exceptions must derive from BaseException
[7:09:43 AM] No behavior active.
[7:09:43 AM] Onboard engine just started.
[7:09:43 AM] Traceback (most recent call last): [+]
[7:09:43 AM] ␛[92m--- Behavior Engine finished - ready for more! ---␛[0m
[7:09:43 AM] Mirror built for checksum 1383782741.
[7:09:43 AM] Executing mirror...
[7:09:45 AM] Onboard engine just started.
[7:09:45 AM] Onboard engine just started, stopping currently running mirror.
[7:09:45 AM] Mirror finished with result preempted
[7:09:45 AM] ␛[92m--- Behavior Mirror ready! ---␛[0m
[7:09:56 AM] Onboard engine just started.
Behavior:
OCS is possibly out of sync
each secondConcurrencyContainer Container returning outcome finished
Behavior execution for Concurrency_Test: 1708774436 failed! [-] exceptions must derive from BaseException
also experienced but cannot reproduce reliably:
Log:
[7:06:52 AM] ␛[92m--- Behavior Mirror ready! ---␛[0m
[7:06:52 AM] Onboard engine just started.
[7:07:03 AM] Onboard engine just started.
[7:07:14 AM] Onboard engine just started.
[7:07:25 AM] Onboard engine just started.
[7:07:35 AM] --> Preparing new behavior...
[7:07:35 AM] --> Mirror - received updated structure
[7:07:35 AM] Received a new mirror structure for checksum 1708774436
[7:07:35 AM] BE Starting [Concurrency_Test : 1708774436]
[7:07:35 AM] A
[7:07:35 AM] B
[7:07:35 AM] Mirror built for checksum 1708774436.
[7:07:35 AM] Executing mirror...
[7:07:36 AM] OCS is possibly out of sync - onboard state is /Container/B [-]
Check UI and consider manual re-sync!
(mismatch may be temporarily understandable for rapidly changing outcomes) 1
[...]
[7:07:43 AM] OCS is possibly out of sync - onboard state is /Container/B [-]
Check UI and consider manual re-sync!
(mismatch may be temporarily understandable for rapidly changing outcomes) 1
[7:07:43 AM] ConcurrencyContainer Container returning outcome finished (request inner sync)
[7:07:43 AM] Behavior execution for Concurrency_Test: 1708774436 failed! [-]
exceptions must derive from BaseException
[7:07:43 AM] No behavior active.
[7:07:43 AM] Onboard engine just started.
[7:07:43 AM] Traceback (most recent call last): [+]
[7:07:43 AM] ␛[92m--- Behavior Engine finished - ready for more! ---␛[0m
[7:07:43 AM] Onboard behavior failed!
[7:07:43 AM] Mirror finished with result preempted
[7:07:43 AM] ␛[92m--- Behavior Mirror ready! ---␛[0m
[7:07:43 AM] No onboard behavior is active.
A couple of notes, then I'll put together an example for tutorial. I think this is normal and expected behaviors
You have "autonomy low", which is typical of log states. That means they finish and move to next state. Both A & B finish after one execution and return, which causes the concurrency to return immediately. Because you have the output of concurrency tied to statemachine finished, the behavior is done. Because both are log states, they both return after one execution call, so it doesn't matter if || or &&.
A limitation of the current version of FlexBE UI is that it only shows one of the active states in concurrency.
Try changing the required autonomy level of output so that it will pause.
The failed and "BaseException" issue is unexpected, and I'll be looking in to that today.
There seems to be issue with exiting concurrency container and exiting behavior immediately causing exception.
If I add a log state after the concurrency container it not longer gives the exception.
There also seems to be issue with blocking transitions inside the concurrency, so I'll need to look into that more.
Thanks for reporting.
There is also the known issue of FlexBE UI only showing one state inside the concurrency container.
Ok great, thanks for the fast handling of the issue.
I already set the required autonomy level
of the states inside the concurrency container to high
and started the behavior with Block transitions which require at least 'High' autonomy
.
But only if the outcome of the concurrency container is A(done) && B(done)
, the outcome of A can be forced in the Runtime Control. If the outcome of the concurrency container is A(done) || B(done)
the concurrency container still finishes immediately, even if there is state added after the concurrency container.
I have spent a bit of time looking at the internals of how flexbe handled concurrency containers, and issues with sync I saw on ROS 2.
I have tested a significant modification to FlexBE and posted as ros2-pre-release branches on both flexbe app and flexbe_behavior engine
These two must be used consistently as they do require an API change.
See relevant change logs
I also have developed and introduces a new https://github.com/FlexBE/flexbe_turtlesim_demo release with several detailed examples related to concurrency containers. Specifically, see Examples 3 and 4.
A brief discussion of changes follows. I would appreciate any testing and feedback of these changes. There are still some clean up to do on them, but barring objections I plan to introduce these changes into an Iron release this fall.
The old approach, only set the "current state" as the initial first state in a concurrency container. This would still show as active even if finished and another state was active.
The new approach introduces a "state id" hash code for every state using a masked 23-bit hash code. This hash code is known to both onboard and mirror side. The lower 8-bits are set to the outcome (allows 255 outcomes on a state which is likely way more than anyone needs, but until we clearly need more than 23-bits to encode state id I chose to use 8-bits for outcome mapping.
Instead of reporting only the outcome changes, the new system reports an array of "current active states" for sync, and each outcome encodes both the outcome and state id using a 32-bit value.
This requires a slight increase in bandwidth, but I judge the reliability increases worthwhile.
The new approach reports returns from individual states and containers to help keep the mirror consistent and identify sync issues and recovery.
If an internal state returns, but another remains active the FlexBE UI will change. It currently shows the "deepest" active state. Currently only that state can be preempted, but with new changes we expect to support operator preemption at any level. As part of these changes, the OperatableStateMachine is now a pseudo manually transitionable state. This is a temporary hack during development. Long term, we will introduce a new ManuallyTransitionableStateMachine to mimic the state hierarchy.
Please test the new branches with your system any the Turtlesim tutorials mentioned above, and give me any feedback on the performance
@pschillinger
Thanks for the pointer @dcconner! First as a disclaimer, I'm not yet familiar with every single technical detail of the ros2-devel
and ros2-pre-release
branches, so I might need to revise or refine during the next days what I say now.
What I can say regarding the way transitions worked in concurrency containers so far is that the transition behavior as described above is indeed as expected, even though admittedly not most intuitive. I would mainly attribute this to an initial design limitation on my side, or in other words, FlexBE did not include concurrency initially and the concurrent execution of states was added on top under the constraint (mainly dictated by the API between the engine and the GUI) that there is always a single active state to be operated.
What this means is briefly summarized in the tutorial on Parallel State Execution:
Nevertheless, there is always one main state in a concurrency container, indicated by the same notation as the initial state of a state machine, which works as described in the next section. In general, any of the states can be set to the main state. [...] During execution, the main state of a concurrency container is monitored in the GUI as known from state machines. If this state is a state machine itself, outcomes of inner states can be forced or blocked by the selected autonomy level as usual. All other states not being the main state are running in the background. Their state of execution is not monitored, even if they are state machines. Consequently, they have no knowledge about the autonomy level and cannot be controlled manually. This might change in the future, but for now, this is how it works.
What this implies in consequence is, as observed in the initial example, that the outcomes of background states are not blocked by the autonomy level and might return immediately if the respective state dictates so. At least this is the expected part. Where it gets messier now is that, due to the fact that background states are not aware of the GUI, background states won't send transition notifications to the GUI, thus the GUI has to assume it might have gotten out of sync whenever a concurrency container returns an outcome (i.e., this happens when the CC outcome was triggered by a background state). This is also related to the observed warning of being potentially out of sync, a monitoring done by the behavior mirror to be precise but resulting from this fact.
Long story short, an improved mechanism to handle outcomes as proposed by @dcconner sounds promising to me and might be designed with a more intuitive handling of shared autonomy in the context of concurrency. I will need to do some testing myself to support with more details, though. I hope I can allocate some time for this next weekend.
There is now a rolling-pre-release
branch for flexbe_behavior_engine that has rebased from latest humble and rolling releases, and added some additional features. I'm going to leave the ros2-pre-release
as is for now, but rolling-pre-release
branch is the preferred branch for testing now. Still use ros2-pre-release for the flexbe_app for now.
The iron, rolling, and ros2-devel branches have the concurrency container and state id changes. Please use those branches . For consistency you need version 4.0+ of the UI and 3.0+ of the flexbe_behavior_engine
I try to implement a simple Statemachine with a single concurrency container, but it fails to execute:
This is the statemachine implementation:
concurrency_test_sm.py:
When executed with "Block transitions which require at least "Low" autonomy the
Console output is:
A and B are printed, which is fine. I would expect to see A and B in the "Behavior Execution" in the "Runtime Control". There I would expect to be able to select "done" outcome from either A or B. But the Behavior finished by itself, without waiting for Operator Input.
Am I missing something?