Klipper3d / klipper

Klipper is a 3d-printer firmware
GNU General Public License v3.0
8.99k stars 5.18k forks source link

Rework toolhead and move_queue flushing #6410

Closed KevinOConnor closed 6 months ago

KevinOConnor commented 7 months ago

This PR fixes "Timer too close" errors in the pwm_tool module when SET_PIN commands are issued while the toolhead is idle. (And also likely fixes similar errors that may occur with "MANUAL_STEPPER SYNC=0" commands.)

Previously, the toolhead code would only manage flushing of the lookahead, kinematic, and mcu move_queues while the toolhead was in an active state. Once the lookahead queue becomes empty, the toolhead would disable its internal flushing timers. This PR reworks the toolhead code so that its flush timer will stay active any time there is data in any of the kinematic queues, regardless of the lookahead state.

@Cirromulus , @dmbutyugin - FYI.

-Kevin

A-Panic commented 7 months ago

I can confirm that with this branch I'm able to manually control the pwm output without errors.

Cirromulus commented 7 months ago

While the changes look sensible, unfortunately I don't feel very confident in my knowledge about this part of the toolhead code and thus can't give a detailed review.

I would personally like an overview about the states and their implications in some sort of documentation, but this is perhaps for a different MR.

At least I could test it sucessfully.

KevinOConnor commented 7 months ago

I would personally like an overview about the states and their implications in some sort of documentation

Indeed. This is a complex (and potentially fragile) area of the code.

At a high-level there are a couple of main goals - we want to be able to buffer multiple seconds worth of movement commands into the micro-controllers, and we want to encourage a modular code system. The first is important to ensure that the micro-controllers can efficiently process precise movements even when there is intermittent communication and processing jitter. The second is important for code maintenance - as if any one part of the code needed to know about all possible code that sends commands to the mcu then it would become nearly impossible to maintain. These two goals are somewhat in conflict.

The challenge is that if some part of the code buffers up a couple of seconds of movement into the mcu, and if some other part of the code wants to buffer some commands at a time prior to that, there is a very real risk that the mcu wont have sufficient storage remaining to store those commands (or there might not be enough remaining bandwidth, or not enough remaining host cpu time). This could result in the mcu not having those commands by the time they are needed - in effect, one producer could starve the other producers, resulting in faults.

To account for this, the code tries to globally track the maximum "planned time" (toolhead.print_time) and to globally flush out any "planned updates" in waves (toolhead._update_move_time()). This is a challenge though, because, as mentioned above, we don't want any one part of the code to know about all other possible parts of the code.

This new code tracks four "queuing states" (toolhead.special_queuing_state): main (denoted by an empty string ("") to indicate no special queuing state), "NeedPrime", "Priming", and "Drip". Roughly these states mean:

The core change in this PR is that the toolhead can now continue to flush the host buffers even when not in the main queuing state. That is, flushing continues even in the "NeedPrime" and "Priming" states. Previously flushing was occurring in the main and "Drip" state, but not in the "NeedPrime" and "Priming" states. Indeed, as part of this PR, the previous "Flushed" state was renamed to "NeedPrime" to make it more clear that "NeedPrime" is about needing to prime the lookahead queue and is not indicative of a full flush. When in the main state, flushing is done while advancing the print_time. When in the "NeedPrime" and "Priming" states, flushing is done optimistically and is controlled by calls to toolhead.note_kinematics_activity().

Hope that helps, -Kevin