Mechanical-Advantage / AdvantageKit

Monorepo for robot framework and tools
GNU General Public License v3.0
151 stars 40 forks source link

Mock time is used on the real robot #33

Closed PeterJohnson closed 10 months ago

PeterJohnson commented 1 year ago

These lines (specifically enableMockTime()) should never be run on a real robot. What this does is replace the C++-level wpi::Now() function with a function that returns the setMockTime() value. Everything at the C++ level, including things like ntcore, will use that exact time until setMockTime() is called again. This is not at all what the rest of the library expects and will result in very broken behavior. We will probably disable this functionality in wpiutil on athena in our next release to avoid this.

https://github.com/Mechanical-Advantage/AdvantageKit/blob/main/junction/core/src/org/littletonrobotics/junction/Logger.java#L144-L145

camearle20 commented 1 year ago

Jonah can expand more on this but IIRC this is necessary since there is no (easy) way to override the timestamp source for native parts of the library except doing this. We need to be able to do this in order to lock the software's understanding of what the current time is at the start of each loop cycle so that replay is completely deterministic. If this is removed from WPILib and another option to do this is not added in its place, we will be forced to move our shimming to a lower level than just replacing java class files in the WPILib jar, and actually will have to ship our own version of the HAL to get around this.

jwbonner commented 1 year ago

We added that specifically to inject the timestamp into the half-dozen wpimath Java classes that use WPIUtilJNI for the timestamp instead of Timer.getFPGATimestamp(). We can remove it for now (which will result in non-deterministic behavior in those classes that breaks replay), and hopefully look at shimming wpiutil or wpimath at the Java level in the future.

jwbonner commented 1 year ago

We should confirm if there are native functions that we would need to shim a different way because that's a much harder problem as Cameron said, but otherwise we may just be able to shim more Java classes.

camearle20 commented 1 year ago

I think the goal is that we'd like to minimize the number of classes that we have to shim. We're not entirely happy that we have to do any shimming in the first place, but we understand that it is necessary since WPILib was not designed to work in this way. Leaving in the ability for us to manually mock the timestamp across the entire library makes things a lot easier for us.

Could you expand on the broken behavior this would cause? We are careful to always update the mock time each loop cycle, so I'm not sure what would break here. Is there anything that depends on being able to read a different timestamp within a given loop cycle that isn't using the Timer class to do so? In those cases (i.e. Watchdog) we've already covered this by shimming them to use our "getRealTimestamp" method which doesn't get locked.

PeterJohnson commented 1 year ago

I can understand doing it in simulation replay but it should never be overridden on a real robot. I plan to leave the functionality in for simulation but will almost certainly disable it for code running on the Rio.

jwbonner commented 1 year ago

We need to shim the timestamps, DS data, etc when running on the real robot to make the control logic deterministic --- otherwise we can't recreate the same conditions in replay. What are the specific issues that this is causing in native code? Have you found a direct connection to some of the NT issues that have been coming up recently?

PeterJohnson commented 1 year ago

Yes, there is a direct connection to some of the NT issues, and in general it may cause very unpredictable behavior to override it and only update it when the main loop runs, as it makes intermediate timestamps useless and very low precision. We use the wpi::Now() function for all timestamps in C++, including very low level things like libuv loop time. NetworkTables uses it for everything from timestamp comparisons on received values to detecting network timeouts, as well as synchronizing time between the server and clients. Essentially what you're doing is replacing a high resolution clock (1us precision with high stability and accuracy) with a very low resolution unstable, low accuracy clock (updated approximately, but not always, around 20 ms), which breaks a lot of assumptions in how timestamps are supposed to work.

jwbonner commented 1 year ago

Yes, I understand that the shimmed timestamps change the functionality when reading within a single cycle (that's the whole point). From everything we've seen, that causes no issues for almost all of WPILib's functionality. It seems like in this case we've just cast the net a little too wide. However, we still need a replacement to continue shimming some of the timestamps on the real robot. I think the best options would be:

PeterJohnson commented 1 year ago

Yes the MathShared option is what I was thinking. I just confirmed the current behavior is definitely causing NT issues.

calcmogul commented 1 year ago

If you need to key the data replay off of something, couldn't you use a marker inserted in the log at the loop start instead of the timestamp?

camearle20 commented 1 year ago

This is about being able to replay code exactly as it ran on the robot, not being able to key it for visualization. If we can't control the return value of methods that get the timestamp, they will report different values when running the inputs back through the code in replay in the simulator (for example, time deltas measured this way will not be the same as the code is probably running on a faster machine, threads will be scheduled differently, etc.). By locking them at the beginning of the loop cycle when the code is running on the robot (and logging them) we can ensure that when the code is replayed, we can override the timestamp to be that exact value in the replay. To support the timestamp changing throughout the loop cycle, we'd need to record every time the timestamp was accessed within the loop cycle, and then hope that the code being replayed does all of that access completely single threaded.

calcmogul commented 1 year ago

But if you modify the time getters to return the same timestamp for the whole loop, you're lying to the rest of the robot code. State-space controllers and state estimators care about the true time delta, for example.

camearle20 commented 1 year ago

We are still able to measure dt between loop cycles. We're only losing out on minute changes in time within the loop cycle. Since it's reasonable to assume that controllers get called at roughly the same point within each loop cycle, the dt between these calls is essentially equal to the dt between cycles. Furthermore, we as a team value this deterministic replay far more than we care about a few microseconds of error in the dt being provided to a control loop.

calcmogul commented 1 year ago

The dt can actually vary by several milliseconds between the start and end, depending on what you're doing. State estimators in particular can take a while to update.

camearle20 commented 1 year ago

We could look at making this configurable on our end if that's valuable, and add a disclaimer to the docs that it makes replay potentially non deterministic. However, for our own usage we'd still prefer to operate in the way that we currently are. In terms of the NT issues, we do all of our NT reads synchronously at the start of each loop cycle and log the values, so this is a case where using the correct timestamp should be done as Jonah mentioned. Thinking about it more, we could probably do a similar abstraction for control loops and state estimators, so the code treats them more as a piece of "hardware" that it is interacting with.

jwbonner commented 1 year ago

For the pose estimators in particular, they all have overrides for manually setting the timestamp. If an AdvantageKit user cared about the timing more precisely they could include the "real" timestamps in the set of inputs from the IO layer (hardware abstraction).

calcmogul commented 1 year ago

However, for our own usage we'd still prefer to operate in the way that we currently are.

That's the thing though. More teams than just you are using this framework now, so having a documented escape hatch would be good.

camearle20 commented 1 year ago

@calcmogul Jonah is working on a configuration for this to globally not use locked time stamps, but I was thinking about this a bit more in the mean time: isn't the dt that really matters here the dt between sensor readings? It seems like if you're using the dt between calls to the controller/estimator you're actually lying to it since that time doesn't match the value of the sensor readings that are being used for the calculation. In this case it almost seems better to be using the dt as measured right at the start of the loop since this is where (we, at least) do all of our input reading. Am I thinking about that wrong?

jwbonner commented 1 year ago

I just release AdvantageKit v2.1.3 that disables timestamp mocking on the real robot (with appropriate warnings that it breaks deterministic timestamps and causes log replay to function incorrectly). Once we have a workaround set up I also have the method and associated docs ready to merge for disabling deterministic timestamps globally:

Logger.getInstance().disableDeterministicTimestamps();
calcmogul commented 1 year ago

It seems like if you're using the dt between calls to the controller/estimator you're actually lying to it since that time doesn't match the value of the sensor readings that are being used for the calculation. In this case it almost seems better to be using the dt as measured right at the start of the loop since this is where (we, at least) do all of our input reading.

It gets messy if you assume the loop time isn't negligible.

The state estimator has two steps: Predict and Correct. The Predict duration is expected to be the time between control input updates since we assume the control input has been constant the whole time.

// u is control input; y is measurement
void loop() {
  y = sensor.get();
  loopStartTime = now();

  // May take a while
  observer.predict(u, loopStartTime - loopEndTime);
  observer.correct(y);

  previousU = u;
  // Make take a while
  u = controller.get(observer.getEstimate(), reference);

  motor.set(u);

  loopEndTime = now();
  // May take a while
  observer.predict(previousU, loopEndTime - loopStartTime);
}
jwbonner commented 10 months ago

This issue was addressed, including a section in the docs explaining the purpose of deterministic timestamps with suggested alternatives.