Overall commander architecture is complicated

jpreiss commented 3 years ago

Many parts of the Crazyflie firmware can affect how the setpoint_t struct is filled in each stabilizer loop, and/or how radio commands are interpreted. I am wondering if it would be worth the effort to refactor.

I am aware of:

The supervisor (formerly sitAw) framework for detecting idle/flying/tumbled state.
Low-level setpoints preempting the high-level commander...
... but reverting to "stabilize mode" and then either high-level or "declaring an emergency" if a setpoint is not recieived for too long.
The priority system for low-level setpoints.
- Allows traditional R/C remotes to override CRTP commands. Not sure if used elsewhere.
The notifySetpointsStop meta-command to override/modify mechanism 2.
The high-level commander's internal notion of idle, flying, landing, and emergency stop (via wrapping planner.c).
The high-level commander's groupMask for broadcasted high-level commands.
The parameter system param commander.enHighLevel.
Possible interaction between the app layer and the radio link.
- I do not know anything about the app layer, but from crtp_commander_high_level.c it appears that the app layer interacts with the rest of the firmware by "sending" CRTP packets.
The collision avoidance system.

I can see the appeal of implementing the supervisor outside the commander and only using sensor data and motor state. It is harder to introduce bugs. It reminds me of a subsumption architecture.

Collision avoidance is also a "setpoint modifier" instead of a "setpoint creator", so it seems reasonable to keep it separate.

For everything else, I do not personally see much benefit to implementing them as independent subsystems. I understand that we have reached this design by many decentralized developers adding features over time. However, I feel that it is now hard to build a mental model of the firmware. For example, the prospect of making notifySetpointsStop a 100% reliable feature in all states (see bug report in Crazyswarm) feels more difficult than it should.

My intuition would be to unify many of these factors into a state machine. Maintaining radio protocol compatibility (#849) might make this harder, but I think it is still possible. It would also be a good first step towards making the whole commander framework compilable and bindable for x86 (#602).

I am curious - how others feel about this?

whoenig commented 3 years ago

I agree that this is a problem and needs a redesign. One big pain point is that it is really hard (or even impossible?) to switch high-level -> low-level safely.

jpreiss commented 3 years ago

I am working on a draft of a state machine implementation to help with discussion. I have some questions...

Regarding the setpoint priority system, in https://github.com/bitcraze/crazyflie-firmware/issues/113#issuecomment-275405803 @ataffanel said:

The setpoint with the highest priority wins. If the setpoint with high priority is not set for 500ms a setpoint with lower priority will be allowed. This allows to fly with a PPM RC transmitter while being connected to the client (PPM has higher priority). This is also thought to help implementing more autonomous flight in the future (like reading a sequence of setpoint from an SD-Card).

However, when we created the high-level commander based on the crazyswarm firmware for, we chose not to implement the low/high level switching using the priority system: in https://github.com/bitcraze/crazyflie-firmware/issues/293#issuecomment-366556447, @whoenig said:

I want to avoid that the high-level commander generates low-level setpoints at a fixed rate: this will be less accurate in terms of latency and will make it much more complicated to run the control loop at a higher or lower frequency. Instead of letting the high-level commander actively push data, I want the stabilizer to query the setpoint whenever needed.

So there are currently only three priorities: DISABLE, CRTP, and EXTRX (external receiver). If we go with a state machine, DISABLE would likely be replaced by generic idle and error-handling mechanisms. But would we also want to let the state machine handle EXTRX overriding CRTP, replacing the priority system entirely? Or do we want to maintain the possibility of adding more priorities in the future?

jpreiss commented 3 years ago

...and regarding the param commander.enHighLevel: do we need it at all?

Currently if enHighLevel is set to zero, sending high-level commands on the radio will have no effect. But low-level commands already preempt high-level commands. Is there a scenario where we want to block high-level commands, but we are not also sending low-level commands?

Otherwise, it is used when low-level setpoints time out, to choose between the null setpoint and the high-level setpoint. This happens in two main cases:

The low-level commands were a planned flight, and we now want to switch back to high-level for some goTos, land, etc.
The low-level commands were an unplanned manual override, because something bad was happening in high-level mode.

But no matter what, in the current code we already stop the high-level commander the moment we receive a low-level setpoint:

https://github.com/bitcraze/crazyflie-firmware/blob/a547cb1dc272cdfaa4f965a3ec9b86d0dec42c32/src/modules/src/commander.c#L82-L90

So there is never any "memory" of the original high-level command. After it has been stopped, it does not make sense to query the high-level commander for a setpoint. So the current code always expect to get a fresh high-level command when low-level commands end.

Therefore, it seems that any switch into high-level mode, either from idle or from low-level mode, will always require sending a new high-level command. This already signals the user's intent to use high-level mode. So do we also need a parameter?

whoenig commented 3 years ago

Historically, enHighLevel was just there to simplify the review process when we went upstream (disabled by default - so no big harm done). I agree that it is not needed anymore.

jpreiss commented 3 years ago

I am getting close to a draft PR that unifies the high/low-level commanders into a state machine. This will be bind-able so we can start writing unit tests for high/low switching, etc.

Can anyone from Bitcraze reply to my comments on the priority system above? How do we feel about having exactly two priorities (CRTP and EXTRX) with the switching logic in the state machine vs. maintaining the priority system as a separate thing?

whoenig commented 3 years ago

I just had an informal meeting with Bitcraze and we discovered some more issues with the current design. We think it would be good to discuss this in person. Specifically, we wanted to discuss if it might be better to switch to a push-based design, where the high-level commander has to push setpoints into the priority queue, like CRTP and ExtRX. From my understanding, your state-machine would still try to switch between push (CRTP, ExtRX) and pull (HighLevel commander)?

jpreiss commented 3 years ago

I can discuss over Zoom. My state machine still uses a pull-based design for high-level mode, but I think it would also be compatible with a fully push-based design.

The state machine code is feature-complete, but it has a bug: when I test on real hardware it triggers the watchdog timer, but flies fine in a debug build with the watchdog disabled. I am guessing I made some mistake in using the mutex. Hopefully it's minor. If desired, I can go ahead and share the watchdog-triggering code in a draft PR anyway.

whoenig commented 3 years ago

A PR or link to the branch would be great! Then everybody can take a look before we meet (or discuss here). I don't think the concrete bug-free implementation is that important at this stage.

jpreiss commented 3 years ago

The code is on https://github.com/jpreiss/crazyflie-firmware/tree/libcmdmerge.

In particular: https://github.com/jpreiss/crazyflie-firmware/blob/libcmdmerge/src/modules/interface/libcommander.h https://github.com/jpreiss/crazyflie-firmware/blob/libcmdmerge/src/modules/src/libcommander.c

edit: there are some TODOs in the switch statements for the state machine. These are (state, input) combinations for which it was not obvious (to me, based on current firmware) what should happen. They may require discussion.

knmcguire commented 3 years ago

@jonasdn do you remember what the outcome was of the triage meeting on this issue?

jonasdn commented 2 years ago

We welcome contributions in this area! We would like to see the high-level commander to be more like other commanders, see the summary of our discussions by @whoenig above. We removed the Triage-tag since it seemed like this issue was moving forward on its own.

We welcome PR from you @jpreiss and please let us know if you want help, or if your time is to limited to work on this. There are dragons in the high-level commander. Which we noticed while doing our demos. Making sure it does not do special things in regards to watch-dogs and the way it inserts setpoints would be nice.

Making it easier to reason about the commander is the overall goal.

jpreiss commented 2 years ago

Was there a final decision on the push vs. pull architecture?

whoenig commented 2 years ago

My understanding was that we should make the architecture as consistent as possible. In this case, this likely means changing it to a push-based architecture, so that the human-pilot override is more naturally coming out of the priority-based architecture.

jpreiss commented 2 years ago

How will we handle the programmatic switching from low-level to high-level modes? This would need to override the normal behavior of the priority mechanism. I guess in the C code we can add a generic mechanism to modify the priority behavior.

whoenig commented 2 years ago

The priority is 1. ext RC, 2. low-level, 3. high-level (with 1 being highest). So if there are no extRC and low-level commands, the commander would automatically accept the high-level commands. I think in all cases we have to assume some sort of minimum rate, to accept a continuous stream from one source. In which case do you think we need to modify the priority behavior?

jpreiss commented 2 years ago

Because the priority mechanism will not enqueue a lower-priority setpoint when a higher-priority one is in the queue:

https://github.com/bitcraze/crazyflie-firmware/blob/7a0ed44bb28440770702d95cd2ac4862e1ef5071/src/modules/src/commander.c#L82-L90

and a stale high-priority setpoint in the queue will not be replaced by a null-priority setpoint until after the timeout period has elapsed:

https://github.com/bitcraze/crazyflie-firmware/blob/7a0ed44bb28440770702d95cd2ac4862e1ef5071/src/modules/src/commander.c#L112-L118

To handle the common sequence of (takeoff, low-level, land) we need some way to switch from low-level to high-level without waiting for the timeout period.

whoenig commented 2 years ago

I was envisioning that the timeout would simply be much shorter (e.g., 100ms). In other words, it is expected that a continuous stream of setpoints is sent at least at 10Hz. The priority check would be reset after 100ms, so that high-level commands could take effect. A faster mode-switch could be achieved by explicitly clearing the setpointQueue by using a new CRTP command.

jpreiss commented 2 years ago

I feel we should make <100ms mode switching possible for use cases that require "perfection" such as drone shows. 100ms of drifting is still noticeable in some situations, e.g. velocity setpoints.

The problem with clearing the setpoint queue is that we introduce a time period in between the clearing and the receipt of the new lower-priority setpoint, where there is no "last good setpoint" to track, so the controller doesn't know what to do.

IMO, it will be easier to understand the code if we implement high-to-low-priority switching as a first-class operation. This was part of the motivation for the state machine design. I think we can move some ideas from the state machine design to the priority switching mechanism. I can work on a draft.

jpreiss commented 2 years ago

I think we can support instantaneous high-to-low priority switching by adding a command to replace the currently enqueued setpoint's priority with the lowest, i.e. the priority of the high-level commander. Then the sequence of radio commands

low-level setpoint
modify priority command
high-level command

should behave as desired.

But then if the low-level setpoints were preceded by a high-level command, we need to make sure that the setpoints from the previous high-level command are not used, and we wait for a "fresh" high-level command.

One option would be to reset the high-level commander whenever a streaming setpoint arrives. Or, we could do it when the modify priority command arrives. But this feels like a complex special case, when the goal is to make everything uniform.

Maybe we can use some logic based on timestamps to keep it general?

whoenig commented 2 years ago

I like the priority idea.

But then if the low-level setpoints were preceded by a high-level command, we need to make sure that the setpoints from the previous high-level command are not used, and we wait for a "fresh" high-level command.

Can this ever happen? The "queue" is always size 1, see https://github.com/bitcraze/crazyflie-firmware/blob/7a0ed44bb28440770702d95cd2ac4862e1ef5071/src/modules/src/commander.c#L52-L55

jpreiss commented 2 years ago

I'm imagining in our new push-based architecture, the high level commander will have a task with a loop like:

while trajectory is not done:
    push a setpoint
    sleep for rate

so consider the following:

goTo with duration 10
sleep 1 second
stream low-level setpoints for 5 seconds
modify priority command
fresh high-level command

The high-level commander will not stop pushing setpoints until 9 seconds after step 2. Those setpoints are preempted by the streaming setpoints during step 3, but suddenly after step 4 they will be accepted again. Therefore, in between steps 4 and 5, the commander will be using those setpoints.

Thinking about it more, the option

reset the high-level commander whenever a streaming setpoint arrives

seems most reasonable to me. I can't think of a practical use case where we would want to preserve the old high-level plan after it has been preempted.

jpreiss commented 2 years ago

@whoenig @jonasdn please take a look at the draft PR - I can't add you as reviewers.

knmcguire commented 2 years ago

So the PR is merged now (jeej!), but before we close this, do you guys see the need of adding more info here? https://www.bitcraze.io/documentation/repository/crazyflie-firmware/master/functional-areas/sensor-to-control/commanders_setpoints/#high-level-commander

jpreiss commented 2 years ago

I don't think we need to add more info.

Re. organizing current info, I feel that distinguishing between "planner" and "high-level commander" might not be necessary. They are separate files, but this is just to keep the trajectory calculations isolated and bind-able. From the CRTP and app layer perspective, there is just the high-level commander.

knmcguire commented 2 years ago

Ah oke good point. This figure is from an lecture I did before where I distinguished where the planning existed in different scenarios, but perhaps it complicates things?

Anyway for now as we are too close to the release I won't change this anymore but let's keep it in mind for future iterations.

knmcguire commented 2 years ago

btw, is this issue now considered closed now the new commander architecture is merged in the main branch?

jpreiss commented 2 years ago

I think it's much improved. We no longer need the param enHighLevel and the mechanism of low-to-high switching is easier to understand.

As discussed in bitcraze/crazyflie-firmware#903, removing crtpCommanderHighLevelStop would help simplify further. That is the only thing I can think of. Not sure if that should be moved into its own issue or kept here.

knmcguire commented 2 years ago

I've made an separate issue on the last little thing that we need to remove for fully finishing simplifying the commander architecture, so that we can close this discussion. It is issue bitcraze/crazyflie-firmware#980

bitcraze / crazyflie-firmware

Overall commander architecture is complicated #861