Homing/Probing fixes for multi_mcu drift

TheFuzzyGiggler commented 2 months ago

Made corrections to homing.py under homing_move to physically move the steppers back to the "trigger position". As far as I can tell from my testing this fixes the issue of the drift from multi_mcu homing and allows multi_mcu homing on a shared multi_mcu axis. Copied movement code from force_move so users did not have to have force_move enabled to benefit. This might not be necessary but my original version was looking up "force_move" in the printer object which, if I understand correctly, wouldn't load unless the user had it configured in their configuration file.

Original discussion and my prelim results are here.

Klipper Discourse - barriers-to-multi-mcu-homing-on-multi-mcu-shared-axis

TheFuzzyGiggler commented 2 months ago

Second commit just fixed line lengths that trigged the build check to fail

TheFuzzyGiggler commented 2 months ago

Self Review Info:

1.) Is the submission free of defects and is it ready to be widely deployed?

I have been using this code for nearly a week now and have noticed a marked INCREASE in my print quality and bed adhesion. I attribute this to the probing not having the unintended offset due to multi_mcu homing as described here:

Multiple Micro-controller Homing and Probing

An experienced Klipper developer will need to validate that my changes do not have any unexpected consequences by adjusting the z steppers after probing/homing. From my testing this doesn't seem to be the case, but it's completely possible it's because the effect is minimal and I'm missing it somewhere. Or that my change might break another setup (unlikely, if there is no difference between trigger position and halt position no adjustment is made).

2.) Does the submission provide a "high impact" benefit to real-world users performing real-world tasks?

I believe so, with the rise in CANBus toolheads and multi-mcu machines, this solves the issue that all the users of such setups would face. I would expect this is currently a good portion of the user base and is likely to become larger as such technology is adopted by the wider 3d printer ecosystem/community.

3.) Is the copyright of the submission clear, non-gratuitous, and compatible?

No new file was made so there is no new copyright. Commit was signed off as per guidelines.

4.) Does the submission follow guidelines specified in the Klipper documentation?

Mostly N/A, Did not add any new files or port to new mcu. Python style mostly follows existing code style and, in my opinion, is reasonably easy to follow with clear variable names ("difference","z_diff","stepper_adjust_move","steppers","step_dist" etc.)

5.) Is the Klipper documentation updated to reflect new changes?

This change is transparent to the users. There is no new configuration settings, it works behind the scenes.

After review if the code is found to be bug free and no unintended consequences I will submit an update of the documentation for multi_mcu homing (if we feel like we still need it).

6.) Are commits well formed, address a single topic per commit, and independent?

Yes

ZeyHex commented 2 months ago

Hello!

This is exactly my case, I have a Creality K1, where the level sensor is located on the MCU, and the motors are on another MCU. I updated Klipper to mainline on it and use the project https://github.com/garethky/klipper/tree/adc-endstop to implement the sensors on existing hardware.

Before applying your change, my first layer was perfect, but after that a small elephant's foot appeared.

TheFuzzyGiggler commented 2 months ago

Something to be aware of is, previously there was the mentioned drift and you (and I) adjusted our Z offset based on that drift to get a good first layer. This eliminates that drift so the probing is now (or should be, hence the need for more reviews) more accurate so now your existing z offset might actually be too low. Does that make sense?

The issue was, The sensor would trigger but due to multi_mcu communication from the toolhead to the steppers they didn't halt immediately (it took time for the mcu to process the signal of the endstop and send the notification to the rest of the printer). So the probe would trigger and the steppers would continue on for a handful of steps until they got the notification, overshooting the actual position the probe triggered at. In my case all the steppers were slightly different in oversteps so the gantry was never quite where it thought it was. Hence first layer issues.

If you babystepped up and/or down to get your first layer dialed in and then saved that, you were essentially compensating for that slight misalignment. Now that there isnt (shouldnt be) any misalignment, your z offset might be off a bit.

Plus I'm not sure how that project is calculating the endstop trigger point either, ADC algorithms for detecting tapping points are notorious for being finicky.

In all honesty, if it's just a small elephants foot that's a good sign to me, means there are no massive unintended consequences for other printers. I have a Voron 2.4

ZeyHex commented 2 months ago

I have not used and do not use z_offset. After removing this change, the defect disappeared. so I'm not sure if it's necessary if you initially use z_offset. I don't use

TheFuzzyGiggler commented 2 months ago

Newest commit streamlined the code and removed code duplication.

I also added in some debugging comments (not part of commit) to validate the code is working as expected. Looks promising, especially as I tried things a different way and my individual steps didn't work out so this helps me be confident the mcu.get_commanded_position() isn't just TELLING me it's right, it actually is.

Examples of debugging report while homing and doing probe moves:

Homing:

G28

Overshoot: stepper_x: 22, stepper_y: 13.
stepper_x: Preadjust: 14355, PostAdjust: 14377, Diff: 22
stepper_y: Preadjust: 14346, PostAdjust: 14359, Diff: 13

Overshoot: stepper_x: 7, stepper_y: 10.
stepper_x: Preadjust: 14394, PostAdjust: 14401, Diff: 7
stepper_y: Preadjust: 14379, PostAdjust: 14389, Diff: 10

Overshoot: stepper_y: -14, stepper_x: 23.
stepper_y: Preadjust: -7, PostAdjust: -21, Diff: -14
stepper_x: Preadjust: 28806, PostAdjust: 28829, Diff: 23

Overshoot: stepper_y: -8, stepper_x: 13.
stepper_y: Preadjust: 8, PostAdjust: 0, Diff: -8
stepper_x: Preadjust: 28805, PostAdjust: 28818, Diff: 13

Overshoot: stepper_z: -13, stepper_z1: -8, stepper_z2: -17, stepper_z3: -11.
stepper_z: Preadjust: -2021, PostAdjust: -2034, Diff: -13
stepper_z1: Preadjust: -2016, PostAdjust: -2024, Diff: -8
stepper_z2: Preadjust: -2025, PostAdjust: -2042, Diff: -17
stepper_z3: Preadjust: -2019, PostAdjust: -2030, Diff: -11

Overshoot: stepper_z: -1, stepper_z1: -2, stepper_z2: -2, stepper_z3: -2.
stepper_z: Preadjust: -2003, PostAdjust: -2004, Diff: -1
stepper_z1: Preadjust: -1994, PostAdjust: -1996, Diff: -2
stepper_z2: Preadjust: -2012, PostAdjust: -2014, Diff: -2
stepper_z3: Preadjust: -2000, PostAdjust: -2002, Diff: -2

Probing:


PROBE_ACCURACY SAMPLES=5

PROBE_ACCURACY at X:175.000 Y:175.000 Z:10.000 (samples=5 retract=5.000 speed=20.0 lift_speed=20.0)

probe at 175.000,175.000 is z=-0.907500

Overshoot: stepper_z: 13, stepper_z1: 6, stepper_z2: 11, stepper_z3: 12.
stepper_z: Preadjust: -2019, PostAdjust: -2006, Diff: 13
stepper_z1: Preadjust: -2004, PostAdjust: -1998, Diff: 6
stepper_z2: Preadjust: -2027, PostAdjust: -2016, Diff: 11
stepper_z3: Preadjust: -2016, PostAdjust: -2004, Diff: 12

probe at 175.000,175.000 is z=-0.895000

Overshoot: stepper_z: 18, stepper_z1: 8, stepper_z2: 13, stepper_z3: 17.
stepper_z: Preadjust: -2019, PostAdjust: -2001, Diff: 18
stepper_z1: Preadjust: -2001, PostAdjust: -1993, Diff: 8
stepper_z2: Preadjust: -2024, PostAdjust: -2011, Diff: 13
stepper_z3: Preadjust: -2016, PostAdjust: -1999, Diff: 17

probe at 175.000,175.000 is z=-0.890000

Overshoot: stepper_z: 9, stepper_z1: 16, stepper_z2: 15, stepper_z3: 14.
stepper_z: Preadjust: -2008, PostAdjust: -1999, Diff: 9
stepper_z1: Preadjust: -2007, PostAdjust: -1991, Diff: 16
stepper_z2: Preadjust: -2024, PostAdjust: -2009, Diff: 15
stepper_z3: Preadjust: -2011, PostAdjust: -1997, Diff: 14

probe at 175.000,175.000 is z=-0.887500

Overshoot: stepper_z: 8, stepper_z1: 16, stepper_z2: 17, stepper_z3: 15.
stepper_z: Preadjust: -2006, PostAdjust: -1998, Diff: 8
stepper_z1: Preadjust: -2006, PostAdjust: -1990, Diff: 16
stepper_z2: Preadjust: -2025, PostAdjust: -2008, Diff: 17
stepper_z3: Preadjust: -2011, PostAdjust: -1996, Diff: 15

probe at 175.000,175.000 is z=-0.885000

Overshoot: stepper_z: 7, stepper_z1: 13, stepper_z2: 18, stepper_z3: 11.
stepper_z: Preadjust: -2004, PostAdjust: -1997, Diff: 7
stepper_z1: Preadjust: -2002, PostAdjust: -1989, Diff: 13
stepper_z2: Preadjust: -2025, PostAdjust: -2007, Diff: 18
stepper_z3: Preadjust: -2006, PostAdjust: -1995, Diff: 11

probe accuracy results: maximum -0.885000, minimum -0.907500, range 0.022500, average -0.893000, median -0.890000, standard deviation 0.007969

It looks like my probe is drifting but I haven't done QGL or anything like that, Just homed and generated a sample of the debug. Probing has extremely repeatable accuracy in actual use.

KevinOConnor commented 2 months ago

Interesting. Thanks. This is a very challenging area of the code to work on. A lot of the interactions are very subtle.

If I understand this PR, it's checking if there has been any overshoot (any additional steps taken since the time of the trigger notification) and then moving all the steppers back to their nominal trigger point. That is an interesting approach.

One thing to note is the timing implications of this change. When there is a toolhead.flush_step_generation() followed by a movement, followed by another flush, it is likely to take around 350ms. These delays are needed to ensure the movement can be generated, queued in the mcu, and then fully flushed afterwards. If a user has a 4 stepper Z, that could potentially add a notable delay to each probe attempt.

I'm also unsure of what may happen if the toolhead time is altered within the homing process itself. (Some of the printer.send_event() handlers have subtle behavior.) It's probably fine, but needs to be looked at closely.

I guess I'll need to think about this a bit. -Kevin

TheFuzzyGiggler commented 2 months ago

I just saw your reply, You're correct Kevin. It looks at the "Trigger pos" vs. the "Halting pos" and moves back to the trigger pos. This code was already in there as part of homing, When I looked into it, it looked like the "Trigger" pos was basically looking at the stepper position at that specific time of the probe/endstop trigger per stepper/mcu, which is essentially exactly what I wanted.

The "stepper_adjust_move" is a direct copy from the force_move.py, the only reason for that being is I didn't want people to have to have "[force_move]" enabled to get the benefit.

I'll go back and check the flush timings to make sure it's all working out but I've been running this for several weeks now successfully with a CAN Bus mcu on each stepper motor (Core XY Voron 2.4, so 7 in total counting the actual toolhead) and have had near perfect prints.

To your point there is a VERY minor delay after probing moves as it adjusts each Z stepper sequentially, but honestly we're talking less than a second, at most 1 second. After the first time or two you don't even notice anymore. If that is a concern I could go back and make it only work for people with multiple mcus or multi homing. Though, for those without it there should be no difference in triggering so no stepper adjustments anyways.

I didn't change any printer.send_events() unless the stepper moves trigger one, I didn't trace it down through the code in huge depth.

Let me know your thoughts.

Thanks. -Nick

github-actions[bot] commented 1 month ago

Thank you for your contribution to Klipper. Unfortunately, a reviewer has not assigned themselves to this GitHub Pull Request. All Pull Requests are reviewed before merging, and a reviewer will need to volunteer. Further information is available at: https://www.klipper3d.org/CONTRIBUTING.html

There are some steps that you can take now:

Perform a self-review of your Pull Request by following the steps at: https://www.klipper3d.org/CONTRIBUTING.html#what-to-expect-in-a-review If you have completed a self-review, be sure to state the results of that self-review explicitly in the Pull Request comments. A reviewer is more likely to participate if the bulk of a review has already been completed.
Consider opening a topic on the Klipper Discourse server to discuss this work. The Discourse server is a good place to discuss development ideas and to engage users interested in testing. Reviewers are more likely to prioritize Pull Requests with an active community of users.
Consider helping out reviewers by reviewing other Klipper Pull Requests. Taking the time to perform a careful and detailed review of others work is appreciated. Regular contributors are more likely to prioritize the contributions of other regular contributors.

Unfortunately, if a reviewer does not assign themselves to this GitHub Pull Request then it will be automatically closed. If this happens, then it is a good idea to move further discussion to the Klipper Discourse server. Reviewers can reach out on that forum to let you know if they are interested and when they are available.

Best regards, ~ Your friendly GitIssueBot

PS: I'm just an automated script, not a human being.

Klipper3d / klipper

Homing/Probing fixes for multi_mcu drift #6548