MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.15k stars 19.21k forks source link

[BUG] SKR v1.3 (or any other LPC1768): problem with servo signals that cause issues with bltouch #16171

Closed mlehnhoff closed 3 years ago

mlehnhoff commented 4 years ago

Bug Description

Recently i have upgraded my 3d printer to a Bigtreetech SKR v1.3. Unfortunatly i had some troubles with my bltouch clone (an older version of triaglelab 3d touch). Every now and than on a G28 or G29, the bltouch does not trigger and the printhead continues to drive down into the bed. At first i tried to find a solution in the inernet, but only found a forum article where someone said, that the SKR v1.3 has a bad 5V supply. Some additional Capacitors or an external 5V Supply should help. I tried both, but nothing helped! The 5V supply was not the problem! I did some investigations myself and with the help of an Oscilloscope i found the actual issue: It seems there is problem in the HAL of the LPC1768 (and maybe others) which can produce a wrong signal on the servo output. When the servo pulse should change from 1472µs (M280 P0 S90; stowed bltouch pin) to 647µs (M280 P0 S10;deployed pin) sometimes for one cycle the pulse is 20650µs long instead. This long pulse seems to confuse the bltouch (clone) and even though the pin is deployed, under these circumstances the sensor will not trigger the endstop when it touches the bed. I believe this happens every time, when the "M280 P0 S10" command is issued right in the small window where the servo pin is high for more than 647µs, but less than 1472µs. Than the falling edge for this cycle is already over and it is not executed until the next 20ms cylce (Just a guess, i have no evidence...). But I found a quick and dirty solution that work's for me:

diff --git a/Marlin/src/HAL/HAL_LPC1768/Servo.h b/Marlin/src/HAL/HAL_LPC1768/Servo.h
index 1bbf84c73..97a7bcb54 100644
--- a/Marlin/src/HAL/HAL_LPC1768/Servo.h
+++ b/Marlin/src/HAL/HAL_LPC1768/Servo.h
@@ -58,6 +58,12 @@ class libServo: public Servo {
     static_assert(COUNT(servo_delay) == NUM_SERVOS, "SERVO_DELAY must be an array NUM_SERVOS long.");

     if (attach(servo_info[servoIndex].Pin.nbr) >= 0) {    // try to reattach
+      /* workaround for too long pulse on the servo pin */
+      if ( (servoIndex == 0) && ( extDigitalRead(SERVO0_PIN) == 1 ) ) {
+        safe_delay(3);
+      }
       write(value);
       safe_delay(servo_delay[servoIndex]); // delay to allow servo to reach position
       #if ENABLED(DEACTIVATE_SERVOS_AFTER_MOVE)

It simply checks the current status of the servo pin. If it is high, the change of the signal is delayed by 3ms. This ensures, that the pin is definitely low when the change is applyed, because the servo pulse can not be longer than 2.4ms.

But this is only a quick and dirty hack and it should be fixed in the HAL. And it should also be checked, if this issue can happen on other Controllers too.

My Configurations

Marlin_Configuration.zip

Steps to Reproduce

  1. Use a SKR v1.3 board (or any other LPC1768) with a servo configured (#define NUM_SERVOS 1)
  2. send M280 P0 S90
  3. send M280 P0 S10
  4. watch the servo signal (SKR v1.3: pin P2_00)

Expected behavior: [What you expect to happen] A clean transition from signals with 1472µs pulse width to 647µs.

Actual behavior: [What actually happens] Every now and than you will see a pulse width of more than 20ms on the first cycle after the command.

Additional Information

Maybe it is more likely to see, if you use "M280 P0 S180" and "M280 P0 S0" instead. (greater difference -> bigger window for the issue)

mlehnhoff commented 4 years ago

2.0.5.4 (2.0.x) and bugfix-2.0.x are two different branches. Please try the latest bugfix-2.0.x and let us know if you're still experiencing this issue.

Yes I know and I did use the latest bugfix-2.0.x! 2.0.5.4 is only what the LCD menu tells me

But this does not matter anyways, because the bug is not inside the MarlinFw Code it is inside the pio-framework-arduino-lpc176x

It is even well known which part of the code is causing the issue: the disabled latching of the HW-PWM.

The problem is to find a way to re-enable the latching without having other problems. This is why the latching has been disabled.

sjasonsmith commented 4 years ago

I was chatting with @p3p about this, since I've been investigating similar problems on other platforms. I don't think there is any chance this behavior has been changed, and it's not really worth the effort to re-test right now, unless as part of actually implementing a more permanent solution.

thisiskeithb commented 4 years ago

2.0.5.4 is only what the LCD menu tells me

Then you are not using the latest bugfix. Your LCD will say bugfix-2.0.x.

p3p commented 4 years ago

I'm aware of the 1 period glitch when setting a shorter duty cycle after already passing the new match value, currently I'm not sure how to mitigate the issue. The hardware does not appear to do what it is supposed to do when the pwm shadow registers are enabled.

mlehnhoff commented 4 years ago

2.0.5.4 is only what the LCD menu tells me

Then you are not using the latest bugfix. Your LCD will say bugfix-2.0.x.

okay, my bad, i accidentially cloned the branch 2.0.x instead of bugfix-2.0.x... now i fixed that -> no difference (of course)

mlehnhoff commented 4 years ago

I'm aware of the 1 period glitch when setting a shorter duty cycle after already passing the new match value, currently I'm not sure how to mitigate the issue. The hardware does not appear to do what it is supposed to do when the pwm shadow registers are enabled.

@p3p Can you please tell me, what issues you experienced, when you had latching enabled?

p3p commented 4 years ago

From what I remember (it's been a long time since I was debugging this), with the shadow registers enabled the pulse width would sporadically not update at all, as you can imagine I chose the 1 period glitch rather than that. It seemed to possibly be caused by updating the pulse width more than once in the same period but I wasn't sure.

AnHardt commented 4 years ago

A commanded servo angle has to stay at least 20 ms to be visible on the output as a changed pulse width. To make the new pulse width recognizable by the device/servo it likely needs to stay several pulses. So changing the angle within at least 20 ms has to be forbidden. From what @sjasonsmith found out for the BL_Touch and pointed out in https://github.com/MarlinFirmware/Marlin/issues/18598#issuecomment-657406598 better about 60 ms.

Updating the shadow register more often does not make sense. The register should not be written more often than the angle value changes. (Shadow variable for the shadow register?) Likely it needs an additional patch at a higher level of the servo library to prevent too frequent changes. We already have something similar withSERVO_DELAY, originally ought for preventing the servo signal shutdown before the servo (mechanically) arrives its target.

p3p commented 4 years ago

@AnHardt I know that changing the shadow registers multiple times per period is pointless but I don't see why changing them multiple times before the values are pushed to the real registers on the period end would cause a problem, I'm not sure what the solution to stopping the client application updating the hardware PWM too often would be (if that really is the issue), without a significant performance hit in one way or another anyway.

AnHardt commented 4 years ago

@AnHardt I know that changing the shadow registers multiple times per period is pointless but I don't see why changing them multiple times before the values are pushed to the real registers on the period end would cause a problem,

I also don't know why or if there is a problem. But if you tell us it causes problems we should try to avoid it. A construct like:

static float last_servo_angle = 0.0f;
if (servo_angle != last_servo_angle) {
  set_servo(sevo_angle);
  last_servo_angle = servo_angle;
}

would at least prevent writing multiple times to the shadow register with the same value.

If i remember correctly, the last time i touched the SERVO_PROBE code, before the BL-Touch appeared, i carefully avoided moving the servo and the steppers at the same time with syncs - but i always tested with DEACTIVATE_SERVOS_AFTER_MOVE, because my servos jittered when the stepper was moving, producing a SERVO_DELAY (a few hundred ms) pause after setting the servo_angle. Compared to that the much shorter delays i suggest now are a win in performance.

If the BL-Touch code tries to move servo and steppers at the same time that does not feel to me like the ball i have to play.

Because the BL-Touches want to have a constantly on servo signal DEACTIVATE_SERVOS_AFTER_MOVE is not possible. For the interrupt driven servo libraries a delayed interrupt became catastrophic, messing up the servo signal. A hardware driven PWM would be immune. Usually we have only one servo.

However, i'm quite sure a set_servo(0); set_servo(180); set_servo(0); without pauses will cause absolutely no reaction - neither on a real servo, nor on a BL-Touch.

AnHardt commented 4 years ago

Sorry. Likely my thoughts are to much focused to hardware PWM, at the moment, where the timers compare register only has to be updatet every now and then.

p3p commented 4 years ago

Sorry. Likely my thoughts are to much focused to hardware PWM, at the moment, where the timers compare register only has to be updatet every now and then.

This problem is with the Hardware PWM, and I agree with everything you said, I was just thinking of the problem at the framework level, as in how to make the hardware work reliably even when the client application is doing crazy amount of updates (Marlin) in the same period. The client will expect the last update to take effect, rather than the first, and I have no way of knowing that there are not going to be subsequent updates before the end of the period.

I'l have to look again at the problem to see if a new solution comes to mind, and if that diagnoses is actually correct and I wasn't just being stupid.

AnHardt commented 4 years ago

I'd try when to short lasting positions can be omitted:

servo_update(angle) only updates a volatile variable lets say inter_angle.

an interrupt, either overflow or compare could be:
{  
  static uint32_t counter = 0;
  static uint16_t last_inter_angle = 0;
  if (counter++ > 3) { // if counter should overflow there is a small risk of delaying another 3*20 ms. Every ~55min if 16 bit.
    if (inter_angle != last_inter_angle) { // if counter above 3 the update will be immediate when inter_angle changed.
      update_shadow_compare_register(inter_angle);
      counter = 0;
      last_inter_angle = inter_angle;
    }
  }
}

running about every 20 ms. That should hold the output puls length constant for at least 3*20 ms and then put in the newest commanded angle only. However it will not know what angle comes next or if an update will follow at all - that's impossible to know - some when you will have to begin. It youst warrants the send pulses can be read and the shadow register is not updated very often.

To warrant each and every update is done for at least that time the volatile variable has to be replaced by a queue.

In RC-flying the intermediate positions can likely be omitted. In case of BL-Touch stowing, deploying, resetting, changing_modes, ... is all equally important. Noting of this can be omitted, every command hast to stay long enough to be recognized. There is no universal right solution for a universal servo library.

Additionally for the BL-Touch all commands have to be in sync to the steppers moves. Retracting the probe while going down for the probe should better be omitted. :-) So from my point of view Marlin must be responsible for not updating the angle to frequent.


Edit: Likely the compare interrupt is the right one to update the shadow compare register. The interrupt could then be delayed by higher priority interrupts by about 17ms and still would be in time to have the update-register ready for the copy to the compare-register when the overflow occurs. Should be possible to stop the interrupt if the counter is above 3. Could the be restarted when inter_angle is updated.

Matheusschmitz commented 4 years ago

I'm having the same problem with the SKR mini 1.1. No matter what position i put, the servo always go to same point.

https://www.youtube.com/watch?v=HVyaKdpJsP0

1 2

sjasonsmith commented 4 years ago

@Matheusschmitz the SKR mini uses a different platform and would be unique from this issue. Some changes went in a little over a week ago to improve BLTouch reliability for STM32F1 boards (like yours). Please try using the bugfix-2.0.x branch to see if it resolves your issue.

If you still have problems please go discuss at one of the support venues such as Discord. This particular issue should remain focused on LPC176x boards.

ManIkWeet commented 4 years ago

I also run into this issue with my SKR1.3 and BLTouch clone. I have managed to capture it on video, right around the 1:54 mark: https://www.youtube.com/watch?v=wF0Mia49ECI&t=114s (video is 1080p YouTube has to process it)

Here's also an image showing what it does when it happens during UBL more points

I have tried, like others, the various settings that could help, they didn't help. I am on the latest bugfix-2.0.x branch

dan-and commented 4 years ago

Same issue on my side. I will test the workaround as well

github-actions[bot] commented 3 years ago

This issue has had no activity in the last 30 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 7 days.

sjasonsmith commented 3 years ago

I think this is worth keeping open until a permanent solution can be found.

ManIkWeet commented 3 years ago

I agree with that sentiment

github-actions[bot] commented 3 years ago

This issue has had no activity in the last 30 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 7 days.

ChipCE commented 3 years ago

I think this issue should be keep open.

github-actions[bot] commented 3 years ago

This issue has had no activity in the last 30 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 7 days.

github-actions[bot] commented 3 years ago

This issue has had no activity in the last 30 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 7 days.

mlehnhoff commented 3 years ago

This issue is still not fixed in pio-framework-arduino-lpc176x, so it should remain open.

github-actions[bot] commented 3 years ago

This issue has had no activity in the last 30 days. Please add a reply if you want to keep this issue active, otherwise it will be automatically closed within 7 days.

dan-and commented 3 years ago

I switched to the original bltouch and the issue is gone. The timing should be more relaxed to work with a variants, so please keep it open

Lyr3x commented 3 years ago

I switched from a v2.1 (which i damaged) to a v3.1 yesterday and facing the same issue. Both are genuine BLTouch. Hadn't any problems with my old one but facing the now and then failed deploys now.

I've read about the issue and tried to reduce XY Probing Speed, as well as the stated workaround., played with delay and switching off servo while moving etc. Nothing worked for me. Tried GRID sizes from 3x3 to 10x10 and it randomly fails.

ManIkWeet commented 3 years ago

I switched to Klipper firmware and never had a leveling issue again, if you have a raspberry pi it's very much worth it

descipher commented 3 years ago

From HardwarePWM.h of framework-arduino-lpc176x // Set the period using channel 0 before enabling peripheral LPC_PWM1->MR0 = (CLKPWR_GetPCLK(CLKPWR_PCLKSEL_PWM1) / frequency) - 1; LPC_PWM1->LER = util::bit_value(0); // if only latching worked

// Enable PWM mode
// TODO: this **is very unreliable appears to randomly miss latches** thus not changing the duty cycle
// disabling PWM latch mode at least gives reliable (bit 3)
//LPC_PWM1->TCR = util::bitset_value(0, 3);      //  Turn on PWM latch mode and Enable counters
LPC_PWM1->TCR = util::bitset_value(0);

// update the match register for a channel and set the latch to update on next period static inline void set_match(const pin_t pin, const uint32_t value) { //work around for bug if MR1 == MR0 *match_register_ptr(pin) = value == LPC_PWM1->MR0 ? value + 1 : value; // tried to work around latch issue by always setting all bits, was unsuccessful LPC_PWM1->LER = util::bit_value(pin_has_pwm(pin));

// At 0 duty cycle hardware pwm outputs 1 cycle pulses
// Work around it by disabling the pwm output and setting the pin low util the duty cycle is updated
if(value == 0) {
  set_idle(pin);
} else if(util::bit_test(idle_pins, get_pin_id(pin))) {
  pin_enable_function(pin, pwm_function_index(pin));
  util::bit_clear(idle_pins, get_pin_id(pin));
}

}

Obviously this is the reason we have intermittent issues with BLTouch, clones or any servo on the NXP MCU's using Arduino lib. This needs to be worked on, I would but I do not have a board to test it on. From the LPC176x user manual.

PWM mode causes the shadow registers to operate in connection with the Match registers. A program write to a Match register will not have an effect on the Match result until the corresponding bit in PWMLER has been set, followed by the occurrence of a PWM Match 0 event. Note that the PWM Match register that determines the PWM rate (PWM Match Register 0 - MR0) must be set up prior to the PWM being enabled. Otherwise a Match event will not occur to cause shadow register contents to become effective.

descipher commented 3 years ago

Each time the detach call is made the PWM is disabled. e.g. TERN_(DEACTIVATE_SERVOS_AFTER_MOVE, detach()); if(!channel_active(pin)) util::bit_clear(LPC_PWM1->PCR, 8 + pin_has_pwm(pin)); // turn off the PWM output

The issue can occur when attach is called as set_match does not reload reload LER for MR1 which is the duty match reg.

Lyr3x commented 3 years ago

Thats too low level for me, however I am more than happy to test any changes with my skr pro 1.1 👍

MangaValk commented 3 years ago

i have got an GTR here and just got myself a modern scope. if you give some instruction i can test some things.

descipher commented 3 years ago

i have got an GTR here and just got myself a modern scope. if you give some instruction i can test some things.

Thanks, Will consider that option, really hoping we see some manufacturing support for it e.g. they should send a board to work on it. Remote hands and eyes are much more time intensive.

descipher commented 3 years ago

From a code change perspective it would require using MR0 as the period and MR1 as the duty. Then it should all work with latching.

AnHardt commented 3 years ago

Nether the SKR Pro V1.1 nor the GTR have a LPC-processor. They do not use the library under discussion here! =8O

ManuelMcLure commented 3 years ago

Also, p3p indicates above that he attempted to use the latching mode and it resulted in more problems than it solved...

descipher commented 3 years ago

Also, p3p indicates above that he attempted to use the latching mode and it resulted in more problems than it solved...

Yes, the question is why was it inconsistent, I cannot see what was tried but there are elements I see that would be an issue. For example, when a reattach is done we need to reset the count to ensure the common period latch is updated and that is MR0. With multi channel there needs to be a process to update each channel MR. I don’t see one in that code.

AnHardt commented 3 years ago

When using a BL-touch 'DEACTIVATE_SERVOS_AFTER_MOVE' is unwise and switched OFF by default. Please look up the BL-touch manual.

AnHardt commented 3 years ago

'DEACTIVATE_SERVOS_AFTER_MOVE' was made to avoid servo jitter when using the AVRs software timers. For hardware times it makes absolutely NEVER any sense.

EDIT: Sorry hardware PWM

descipher commented 3 years ago

When using a BL-touch 'DEACTIVATE_SERVOS_AFTER_MOVE' is unwise and switched OFF by default. Please look up the BL-touch manual.

Do you know if this issue occurs on a BL-touch without DEACTIVATE_SERVOS_AFTER_MOVE.

descipher commented 3 years ago

Read the manual, there is no mention of DEACTIVATE_SERVOS_AFTER_MOVE.

AnHardt commented 3 years ago

https://5020dafe-17d8-4c4c-bf3b-914a8fdd5140.filesusr.com/ugd/f5a1c8_a8cd142ffd2f42089c78883f2bb172b7.pdf Last page, last line in the box.

Later Marlin had a BLTOUCH option, automatically setting this. So they did not go into that much detail in later manuals.

descipher commented 3 years ago

Ok, thanks, that on v1.0 with Marlin 1.1.x(1.1.9), nothing in the current mentions it (v3.1), also what if you have a mega2560? https://5020dafe-17d8-4c4c-bf3b-914a8fdd5140.filesusr.com/ugd/f5a1c8_d40d077cf5c24918bd25b6524f649f11.pdf Do you know if this is an issue without it? That would be very useful to know.

AnHardt commented 3 years ago

See edited comment above. Since, at that time, there was no alternative to the AVRs i expect Antclabs have tested this on AVRs only and required it by purpose.

A real check for this would only be to look up, the reaction when the signal is missing, in the BL-touch firmware - what is closed source - what is my main problem with that shit. (And the problem to make 'real' clones)

descipher commented 3 years ago

Also, p3p indicates above that he attempted to use the latching mode and it resulted in more problems than it solved

There is no other option when in PWM mode and it is using latching in the current code but only for MR0.

https://www.nxp.com/docs/en/user-guide/UM10360.pdf page 527 PWM Enable PWM mode is enabled (counter resets to 1). PWM mode causes the shadow registers to operate in connection with the Match registers. A program write to a Match register will not have an effect on the Match result until the corresponding bit in PWMLER has been set, followed by the occurrence of a PWM Match 0 event. Note that the PWM Match register that determines the PWM rate (PWM Match Register 0 - MR0) must be set up prior to the PWM being enabled. Otherwise a Match event will not occur to cause shadow register contents to become effective

descipher commented 3 years ago

If you want to try adding LER for MR1 give this a go:

// update the match register for a channel and set the latch to update on next period static inline void set_match(const pin_t pin, const uint32_t value) { //work around for bug if MR1 == MR0 *match_register_ptr(pin) = value == LPC_PWM1->MR0 ? value + 1 : value; LPC_PWM1->LER = util::bitset_value(0, 1); // Enable Latch for MR0 and MR1 // At 0 duty cycle hardware pwm outputs 1 cycle pulses // Work around it by disabling the pwm output and setting the pin low util the duty cycle is updated if(value == 0) { set_idle(pin); } else if(util::bit_test(idle_pins, get_pin_id(pin))) { pin_enable_function(pin, pwm_function_index(pin)); util::bit_clear(idle_pins, get_pin_id(pin)); } }

descipher commented 3 years ago

FYI https://github.com/p3p/pio-framework-arduino-lpc176x/pull/46 is merged

mlehnhoff commented 3 years ago

I have just tested the new Release 0.2.7 of pio-framework-arduino-lpc176x It seems to work fine. Many thanks to @p3p and @descipher.

So as soon as this package is integrated to Marlin this issue should be solved.

descipher commented 3 years ago

platform_packages = framework-arduino-lpc176x@^0.2.6 lpc176x.ini needs updating ... current release is 0.2.8, fixes multiple issues with pwm and baud accuracy. Will do a PR.