Rework toolhead and move_queue flushing

KevinOConnor commented 7 months ago

This PR fixes "Timer too close" errors in the pwm_tool module when SET_PIN commands are issued while the toolhead is idle. (And also likely fixes similar errors that may occur with "MANUAL_STEPPER SYNC=0" commands.)

Previously, the toolhead code would only manage flushing of the lookahead, kinematic, and mcu move_queues while the toolhead was in an active state. Once the lookahead queue becomes empty, the toolhead would disable its internal flushing timers. This PR reworks the toolhead code so that its flush timer will stay active any time there is data in any of the kinematic queues, regardless of the lookahead state.

@Cirromulus , @dmbutyugin - FYI.

-Kevin

A-Panic commented 7 months ago

I can confirm that with this branch I'm able to manually control the pwm output without errors.

Cirromulus commented 7 months ago

While the changes look sensible, unfortunately I don't feel very confident in my knowledge about this part of the toolhead code and thus can't give a detailed review.

I would personally like an overview about the states and their implications in some sort of documentation, but this is perhaps for a different MR.

At least I could test it sucessfully.

KevinOConnor commented 7 months ago

I would personally like an overview about the states and their implications in some sort of documentation

Indeed. This is a complex (and potentially fragile) area of the code.

At a high-level there are a couple of main goals - we want to be able to buffer multiple seconds worth of movement commands into the micro-controllers, and we want to encourage a modular code system. The first is important to ensure that the micro-controllers can efficiently process precise movements even when there is intermittent communication and processing jitter. The second is important for code maintenance - as if any one part of the code needed to know about all possible code that sends commands to the mcu then it would become nearly impossible to maintain. These two goals are somewhat in conflict.

The challenge is that if some part of the code buffers up a couple of seconds of movement into the mcu, and if some other part of the code wants to buffer some commands at a time prior to that, there is a very real risk that the mcu wont have sufficient storage remaining to store those commands (or there might not be enough remaining bandwidth, or not enough remaining host cpu time). This could result in the mcu not having those commands by the time they are needed - in effect, one producer could starve the other producers, resulting in faults.

To account for this, the code tries to globally track the maximum "planned time" (toolhead.print_time) and to globally flush out any "planned updates" in waves (toolhead._update_move_time()). This is a challenge though, because, as mentioned above, we don't want any one part of the code to know about all other possible parts of the code.

This new code tracks four "queuing states" (toolhead.special_queuing_state): main (denoted by an empty string ("") to indicate no special queuing state), "NeedPrime", "Priming", and "Drip". Roughly these states mean:

main state: This is the state that is used during regular prints. The print_time variable tracks the time of the last move released from the lookahead queue. There are other moves still in the lookahead queue. Commands may be buffered in the mcus up to time print_time. No other code should try to buffer commands prior to print_time.
"NeedPrime" and "Priming" states: These states occur during idle, or at any point that a g-code command request a lookahead flush (eg, a G4 command). The last command processed from the lookahead queue will have decelerated the toolhead to a zero velocity. We don't want to start processing new movements until the lookahead queue has gather around 2 seconds of new movements (hence the name "priming"). The difference between "NeedPrime" and "Priming" is that in a "NeedPrime" state the lookahead queue is empty, while in the "Priming" state, the lookahead has entries in it, just not yet the desired 2 seconds worth of movement.
"Drip" state: This state is only used for homing/probing operations. The toolhead is only in this state during the toolhead.drip_move() call (no other code enters the "Drip" state and that function always exits that state prior to returning). The purpose of this state is to prevent more than about 200ms of commands from being buffered in the mcu. Here we explicitly don't want lots of movement buffered in the mcu, because upon receiving an endstop/probe signal we don't want to have the host/mcu idle around waiting for all those (now unnecessary) buffered movements to drain. That is, we want to be able to rapidly plan, schedule, and buffer new movements upon receiving an endstop/probe signal.

The core change in this PR is that the toolhead can now continue to flush the host buffers even when not in the main queuing state. That is, flushing continues even in the "NeedPrime" and "Priming" states. Previously flushing was occurring in the main and "Drip" state, but not in the "NeedPrime" and "Priming" states. Indeed, as part of this PR, the previous "Flushed" state was renamed to "NeedPrime" to make it more clear that "NeedPrime" is about needing to prime the lookahead queue and is not indicative of a full flush. When in the main state, flushing is done while advancing the print_time. When in the "NeedPrime" and "Priming" states, flushing is done optimistically and is controlled by calls to toolhead.note_kinematics_activity().

Hope that helps, -Kevin

Klipper3d / klipper

Rework toolhead and move_queue flushing #6410