probe_eddy_current: implement surface scannng

Arksine commented 2 months ago

This is a re-creation of PR #6537, rebased against master now that the ldc work has been merged. The first pull request was against the work-ldc branch, so the PR was closed when the branch was deleted.

KevinOConnor commented 2 months ago

Thanks! I have some high-level comments:

This functionality looks useful and I think it is worthwhile to add to Klipper.
The interaction between bed_mesh.py, probe.py, and probe_eddy_current.py looks very subtle and I wonder if refactoring that interaction would make sense. As I understand this new code, bed_mesh.py instantiates probe.py ProbePointsHelper, but instead of passing it a regular Python list it passes in a bed_mesh.py PathGenerator class which mimics a Python list. Then probe.py's ProbePointsHelper checks for a special HAS_SCANNING flag, and replaces its normal logic with probe_eddy_current.py's ProbeScanHelper class. That class checks for the existence of a rapid_iter method, and then calls that - which causes execution to jump back to bed_mesh.py's PathGenerator class. The code flow and data requirements between bed_mesh.py, probe.py, and probe_eddy_current.py seem complex. I wonder it there is a way to make this interaction more clear, but I'm not sure.
Are the bed_mesh.py changes needed for the main "scanning" code changes to probe_eddy_current.py? Or are those changes an additional enhancement on top of regular "scanning mode"?
I noticed this PR adds a third mechanism for gathering eddy current measurements in probe_eddy_current.py ( EddyCalibration:handle_batch(), EddyEndstopWrapper:_add_measurement(), and StreamingContext:_bulk_callback() ). This looks like something that should be refactored. That could be done after merge though.

Thanks again, -Kevin

Arksine commented 2 months ago

Hi Kevin, I'll try to address each question comment.

The code flow and data requirements between bed_mesh.py, probe.py, and probe_eddy_current.py seem complex. I wonder it there is a way to make this interaction more clear, but I'm not sure.

Bed Mesh could instantiate the ProbeScanHelper rather than go through the ProbePointsHelper. I avoided this since I thought the ProbeScanHelper might be useful for other modules that use the probe. Other than that, I'm open to ideas, but I'm not sure if there is a better way to handle it.

Are the bed_mesh.py changes needed for the main "scanning" code changes to probe_eddy_current.py? Or are those changes an additional enhancement on top of regular "scanning mode"?

They are an enhancement, however its necessary to get the best results for a "rapid" scan. Specifically the rapid mode works best when some overshoot is applied to the points where a change in direction occurs. It also helps to add some curvature to the direction change. In addition its necessary to avoid entering faulty regions if possible.

I had considered refactoring Bed Mesh's "points" into a generator previously. Since point generation is configurable at runtime and bed_mesh always generates a new set of points prior to calibration I believe a generator makes sense.

I noticed this PR adds a third mechanism for gathering eddy current measurements in probe_eddy_current.py ( EddyCalibration:handle_batch(), EddyEndstopWrapper:_add_measurement(), and StreamingContext:_bulk_callback() ). This looks like something that should be refactored. That could be done after merge though.

I agree. FWIW, when performing a traditional probe (typically during QGL) on occasion I get a "Unable to obtain probe_eddy_current sensor readings" error. I suspect this is a timing issue, the samples are still being collected by the bulk interface and the bulk callback has yet to be triggered. The fix may be as simple as increasing the delay to .3 seconds.

KevinOConnor commented 1 month ago

Hi.

Sorry - been super busy the last couple of weeks.

The code flow and data requirements between bed_mesh.py, probe.py, and probe_eddy_current.py seem complex. I wonder it there is a way to make this interaction more clear, but I'm not sure.

Bed Mesh could instantiate the ProbeScanHelper rather than go through the ProbePointsHelper. I avoided this since I thought the ProbeScanHelper might be useful for other modules that use the probe. Other than that, I'm open to ideas, but I'm not sure if there is a better way to handle it.

That makes sense. I'll take another look over the next few days to see if any ideas "pop up" for me.

They are an enhancement, however its necessary to get the best results for a "rapid" scan.

I agree on the utility of the functionality. I was more wondering about merge ordering - possibly merging basic scanning and then advanced scanning on top. Not a big deal though.

FWIW, when performing a traditional probe (typically during QGL) on occasion I get a "Unable to obtain probe_eddy_current sensor readings" error.

One thing to note is that the current probing code calls probe.multi_probe_begin() before starting any probes, and that function calls sensor_helper.add_client(). Thus, we spin up the ldc1612 reporting before any probes, keep it active during all probes, and then only halt measurements after the entire series of probes is done.

I did not see an equivalent mechanism in the new scanning code, but it's possible I missed it. If sensor_helper.add_client() is only called at the start of each probe attempt then it wont be stable. There is a startup and shutdown delay in the bulk_sensor system - so depending on timing of a stop/restart, samples might continue uninterrupted, or samples might have large gaps in them. Symptoms would be seeing lots of "LDC1612 starting '%s' measurements" and "LDC1612 finished '%s' measurements" in the logs (as opposed to seeing them just once during a probe series).

If samples are started once for all probes, then there shouldn't be timing gaps in the reports. So, if you're seeing that then we'll need to track down why it is occurring. If you've got a test case that shows it I'll try to reproduce it here.

Thanks again, -Kevin

Arksine commented 1 month ago

Sorry - been super busy the last couple of weeks.

Completely understand, we'll get it done on your timeline.

I agree on the utility of the functionality. I was more wondering about merge ordering - possibly merging basic scanning and then advanced scanning on top. Not a big deal though.

Ah, I see. I think it would be possible to just pick commit f359e43cbdefb1608e8e39ca7b22be1bdb7a37af if that is what you would prefer.

I did attempt to perform thorough testing of the Bed Mesh changes, although there is always the chance of a regression with this kind of change. In the event that something strange pops up a user can dump the entire mesh state using the new BED_MESH_DUMP command, which should make analysis and debugging easier.

If samples are started once for all probes, then there shouldn't be timing gaps in the reports. So, if you're seeing that then we'll need to track down why it is occurring. If you've got a test case that shows it I'll try to reproduce it here.

I haven't run into the issue with scanning. The StreamingContext only adds a client once, then all samples are collected. Traditional probe attempts (probing that moves the Z axis down) occasionally return with no valid samples, so the probing procedure raises an exception. Generally speaking, we need to use the traditional functionality with calibration methods like Quad Gantry Level, because the surface could be too far away for scanning to work. So the issue is not specifically related to this PR, but I thought I would bring it up in case you wanted me to make an attempt to correct it. For example, something like the following might fix it:

diff --git a/klippy/extras/probe_eddy_current.py b/klippy/extras/probe_eddy_current.py
index 2b78464d0..a1a1c271c 100644
--- a/klippy/extras/probe_eddy_current.py
+++ b/klippy/extras/probe_eddy_current.py
@@ -266,7 +266,7 @@ class EddyEndstopWrapper:
         while 1:
             systime = reactor.monotonic()
             est_print_time = self._mcu.estimated_print_time(systime)
-            need_delay = self._trigger_time + 0.200 - est_print_time
+            need_delay = self._trigger_time + 0.300 - est_print_time
             if need_delay <= 0.:
                 break
             reactor.pause(systime + need_delay)

github-actions[bot] commented 1 month ago

Thank you for your contribution to Klipper. Unfortunately, a reviewer has not assigned themselves to this GitHub Pull Request. All Pull Requests are reviewed before merging, and a reviewer will need to volunteer. Further information is available at: https://www.klipper3d.org/CONTRIBUTING.html

There are some steps that you can take now:

Perform a self-review of your Pull Request by following the steps at: https://www.klipper3d.org/CONTRIBUTING.html#what-to-expect-in-a-review If you have completed a self-review, be sure to state the results of that self-review explicitly in the Pull Request comments. A reviewer is more likely to participate if the bulk of a review has already been completed.
Consider opening a topic on the Klipper Discourse server to discuss this work. The Discourse server is a good place to discuss development ideas and to engage users interested in testing. Reviewers are more likely to prioritize Pull Requests with an active community of users.
Consider helping out reviewers by reviewing other Klipper Pull Requests. Taking the time to perform a careful and detailed review of others work is appreciated. Regular contributors are more likely to prioritize the contributions of other regular contributors.

Unfortunately, if a reviewer does not assign themselves to this GitHub Pull Request then it will be automatically closed. If this happens, then it is a good idea to move further discussion to the Klipper Discourse server. Reviewers can reach out on that forum to let you know if they are interested and when they are available.

Best regards, ~ Your friendly GitIssueBot

PS: I'm just an automated script, not a human being.

KevinOConnor commented 4 weeks ago

Thanks again for working on this. Again, sorry for the delay in responding.

My main feedback is that I think we should refactor the probing interface prior to merging this code. I can do this refactoring work, and I can have something up for review next week. I think we should do the refactoring first, as I fear if we add more code with the existing probe interface, it will be increasingly difficult to perform a refactor.

In the future, it seems there are going to be 4 different probing mechanisms with the ldc1612: "descend_until_trigger", "scan", "rapid_scan", and "descend_until_tap". There are also 4 different "probing classes" in Klipper today: probe.py, bltouch.py, probe_eddy_current.py, and smart_effector.py. I think we can rework the "probing interface" in all of these classes so that we can better support the different probing mechanisms in the future.

Roughly, here's what I'm thinking we can change the probing interface to:

All probing commands start by calling probe.multi_probe_begin() and the gcmd instance is passed to that function. This will allow the low-level probing implementation to get access to the probe parameters.
The actual probing will be done with probe.detect_position() (instead of probe.run_probe()). This function may move the toolhead (but is not required to), it may perform a full kinematic flush (but is not required to), and may consume some amount of toolhead time (but is not required to). This function does not return anything. The calling code may invoke probe.detect_position() multiple times and may move the toolhead to new probing positions between each call.
At the completion of the probing operations, probe.multi_probe_end() will be invoked, and this method will return a list with one entry for every probe.detect_position() call. Each entry will contain a 2-tuple: (nominal_position, detected_position). Where detected_position contains the estimated Z height at the given nominal_position. The nominal_position will be roughly around the XY coordinate of the toolhead during each detect_position() call, but is not required to be any particular exact toolhead position.

I'm hoping that the above interface will scale to future ldc1612 and loadcell requirements.

With this interface, I don't think we would need specific "scan" logic in ProbePointsHelper to implement METHOD=scan requests. In this case, the probe_eddy_current.py code can detect METHOD=scan in its probe.multi_probe_begin() method and then not perform a descending move (but otherwise use the same frequency averaging and height detection that "descend_until_trigger" uses).

What are your thoughts on the interface?

Some other notes:

The timing in the upstream probe_eddy_current.py probing_move() code is indeed busted. We can increase the delay, but I fear even that may risk "timing race conditions". I have an alternate fix (wait for the samples to exceed the end_time), but we may also just want to rework the interface. Not sure.
The upstream probe_eddy_current.py assignment to self._is_from_home is indeed a bug.
The most recent patch series here seems to be missing a @staticmethod on to_str(). Not sure if that's a py2/py3 thing.
I think we should only pass a real list (not a "list generator") to ProbePointsHelper. I fear that will be too subtle for future developers working on ProbePointsHelper, as changes to that class may alter the evaluation of the list resulting in subtle behavior changes that are unlikely to get caught in normal testing (as not all developers are likely to have "rapid_scan" capable hardware).
For performing a "rapid_scan", I wonder if bed_mesh can directly detect a METHOD=rapid_scan request in it's cmd_BED_MESH_CALIBRATE(). It can then directly lookup the PrinterEddyProbe instance (via printer.lookup_object()), and directly invoke it's add_client() method. It can then perform the moves and accumulate the corresponding time,frequency,z samples. I think that may help "flatten" the interactions that this PR has in PathGenerator, MoveSplitter, ProbePointsHelper, ProbeScanHelper, and StreamingContext classes.

This is all up for discussion, so let me know what your thoughts are.

Thanks again, -Kevin

Arksine commented 4 weeks ago

Thanks for the feedback Kevin.

With this interface, I don't think we would need specific "scan" logic in ProbePointsHelper to implement METHOD=scan requests. In this case, the probe_eddy_current.py code can detect METHOD=scan in its probe.multi_probe_begin() method and then not perform a descending move (but otherwise use the same frequency averaging and height detection that "descend_until_trigger" uses).

What are your thoughts on the interface?

This sounds like an improvement. I agree that we need a better way to handle all of the different ways a probe sample can be taken. Presumably we would still keep run_probe() for situations where we just need to probe a single location?

The most recent patch series here seems to be missing a @staticmethod on to_str(). Not sure if that's a py2/py3 thing.

Indeed I left it out. I'm surprised it works without the decorator, that could be a Python 3 thing.

For performing a "rapid_scan", I wonder if bed_mesh can directly detect a METHOD=rapid_scan request in it's cmd_BED_MESH_CALIBRATE(). It can then directly lookup the PrinterEddyProbe instance (via printer.lookup_object()), and directly invoke it's add_client() method. It can then perform the moves and accumulate the corresponding time,frequency,z samples. I think that may help "flatten" the interactions that this PR has in PathGenerator, MoveSplitter, ProbePointsHelper, ProbeScanHelper, and StreamingContext classes.

Sounds good to me. In summary, it looks like we can remove ProbeScanHelper from probe_eddy_current.py, and I will implement something similar, specific to rapid scanning, in bed_mesh. It looks like the changes to bed_mesh can be done independently from the probe refactor, so I'll begin work on that.

deriyalexey commented 2 weeks ago

Hi both, when we can expect to see this changes?

Arksine commented 3 days ago

Closing this PR, as it has been superseded by #6617.

Klipper3d / klipper

probe_eddy_current: implement surface scannng #6558