Kalman filter supervisor floods downlink with messages

ntamas commented 4 years ago

Today we have somehow managed to get the Kalman filter into a state where the Kalman supervisor constantly tried to reset the filter, saturating the downlink with debug messages in the process (to an extent that we were not able to connect to the Crazyflie properly since there was no available downstream bandwidth for the CF to respond to the initial requests when establishing the connection). I have no idea what caused this; the CF happened to be outside the convex hull of our UWB anchors, but this has not been a problem earlier (and moving the CF inside the convex hull of the UWB anchors did not solve the problem either).

As a workaround, I have added rate-limiting to the resetting of the Kalman filter, but I was wondering whether this workaround is actually masking the symptoms of a deeper issue. Anyhow, here's a patch that adds the rate limiting:

--- a/src/modules/src/estimator_kalman.c
+++ b/src/modules/src/estimator_kalman.c
@@ -173,6 +173,8 @@ static SemaphoreHandle_t runTaskSemaphore;
 // functions called by the stabilizer loop
 static SemaphoreHandle_t dataMutex;

+// Timestamp of last Kalman filter reset request from the supervisor
+static uint32_t lastResetRequest;

 /**
  * Constants used in the estimator
@@ -297,6 +299,8 @@ void estimatorKalmanTaskInit() {

   xTaskCreate(kalmanTask, KALMAN_TASK_NAME, 3 * configMINIMAL_STACK_SIZE, NULL, KALMAN_TASK_PRI, NULL);

+  lastResetRequest = xTaskGetTickCount();
+
   isInit = true;
 }

@@ -397,8 +401,11 @@ static void kalmanTask(void* parameters) {
       kalmanCoreFinalize(&coreData, osTick);
       statsCntInc(&statsFinalize);
       if (! kalmanSupervisorIsStateWithinBounds(&coreData)) {
-        coreData.resetEstimation = true;
-        DEBUG_PRINT("State out of bounds, resetting\n");
+        if (osTick > lastResetRequest + M2T(1000)) {
+          lastResetRequest = osTick;
+          coreData.resetEstimation = true;
+          DEBUG_PRINT("State out of bounds, resetting\n");
+        }
       }
     }

krichardsson commented 4 years ago

Interesting! Not sure about limiting the actual reset but limiting the logging sounds good to me. My thinking is that if we hit one of the limits, the state in the filter is messed up anyway and must be reset to be useful. Excessive logging on the other hand should be avoided. I was actually a bit hesitant to add the logging from the start but thought that it would be useful to understand what is going on. Maybe we should limit the logging and log max every 5 seconds (for instance), but adding a counter and log something like "State out of bounds, reset 3 times last 5 seconds\n"?

ntamas commented 4 years ago

My thinking is that if we hit one of the limits, the state in the filter is messed up anyway and must be reset to be useful.

That's true, but I was wondering whether it takes several iterations for the Kalman filter to "settle down" and converge after a reset (especially if multiple sensor measurements are fused and they arrive at different frequencies). Having a messed-up state in the filter for a few iterations is not a problem if the drone is not flying (and if it was flying when the state got messed up then it's probably too late anyway). I agree with you that blocking further resets for one second (as it is in m workaround) is probably overkill, but maybe there's a more sensible middle ground.

krichardsson commented 4 years ago

Good point

bitcraze / crazyflie-firmware

Kalman filter supervisor floods downlink with messages #506