EtteGit / EnragedRabbitProject

GNU General Public License v3.0
2.39k stars 338 forks source link

MCU 'ercf' shutdown: Timer too close #75

Open FHOGPHIL opened 2 years ago

FHOGPHIL commented 2 years ago

ok - got this a couple of times now MCU 'ercf' shutdown: Timer too close This often indicates the host computer is overloaded. Check for other processes consuming excessive CPU time, high swap usage, disk errors, overheating, unstable voltage, or similar system problems on the host computer. Once the underlying issue is corrected, use the "FIRMWARE_RESTART" command to reset the firmware, reload the config, and restart the host software. Printer is shutdown

cpiercedev commented 2 years ago

Are you using easybrd?

FHOGPHIL commented 2 years ago

yes

FHOGPHIL commented 2 years ago

Seems similar to #68

FHOGPHIL commented 2 years ago

at least 4 times now and each time on an unload to switch tools mid print

amv101 commented 2 years ago

I am also getting this error on my Delta printer with ERCF. I didnt have this error until I installed the ERCF system on my Klipper powered Delta just this week. I will run the test print again and try to attach the klippy log and gcode.

Specs- Delta printer, Bondtech Extruder, E3D V6 nozzle, AC powered bed via SSR Octopus v1.1 mainboard, 24VDC, Steppers A,B,C = TMC2209 UART, NEMA 17 Raspberry Pi 4B 4GB, connected USB Klipper Octoprint ERCF v1.1, Steppers = TCM2209 UART, NEMA 17 (No EasyBRD, wired direct to Octopus)

FHOGPHIL commented 2 years ago

Thought it might be a camera issue so pulled camera. didnt help, On the unload command, the filament got out of the extruder and when the high speed unload started, it dies

Unload move done, requested = 857.0, measured = 1.3 Correction unload move done, requested = 857.0, measured = 1.3 Too much slippage detected during the unload, please check the ERCF, calling ERCF_PAUSE... MCU 'ercf' shutdown: Timer too close This often indicates the host computer is overloaded. Check for other processes consuming excessive CPU time, high swap usage, disk errors, overheating, unstable voltage, or similar system problems on the host computer. Once the underlying issue is corrected, use the "FIRMWARE_RESTART" command to reset the firmware, reload the config, and restart the host software. Printer is shutdown

FHOGPHIL commented 2 years ago

RPi 4B 2GB/SSD EZBRD and SSD on USB3.0

TravisWilder commented 2 years ago

I had a similar issue and it was caused by bad soldering spots on the EZBRD (done wrong by me)

EtteGit commented 2 years ago

Hi. Just to let you know I'm looking into it. Unfortunately this is a rather vague issue, as this klipper error can be caused by a plethora of different things.

MuddyPuddles4325 commented 2 years ago

I found I could easily reproduce this using the servo in succession. For example if you modify the servo down function to go down, up, then down again with only 200ms between calls it happens every single print. If I make the gap 400ms the problem goes away.

if it helps, I have a Pi4, octopus, klipper expander and easy brd

FHOGPHIL commented 2 years ago

So I found a solution.
every time the command "{% set ercf_params = printer.save_variables.variables %}" comes up I added a line after it that reads "G4 P500". (Obviously there are no quotation marks.) This has stopped the error from coming up.

amv101 commented 2 years ago

Had a chance to work on my new Delta with the ERCF 1.1 and attach the klippy log with the error.

I get the "Timer too close" error consistently if I finish a print, restart firmware, then restart the same print in Octoprint. When the filament is being ejected before the start of a new print, the error occurs right when the filament reaches the reverse bowden tube.

klippy.log

EtteGit commented 2 years ago

After tinkering yesterday, I found that, for whatever reason, recent Klipper updates made the call of ERCF_UNLOAD right after another command crash the MCU (timer too close).

There is no apparant reason for that to occurs, since calling the commands next to each other by hand won't cause any issue, so it's definitely an issue with how klipper is dealing with its command stack // timing. That being said, adding a 200ms dwell right at the beginning of the ERCF_UNLOAD command fixed it. It'll be pushed soon (I need to clean the files still).

EtteGit commented 2 years ago

The fix is live since a few days. Let me know if the issue still pop back

tonylee1918 commented 2 years ago

I am getting the same issue. I increased the dwell to 0.5s but this is still happening. Details -Voron v2+easy brd+lgx+dragon HF+klipper expander -klipper 0.10.0-244 -Issue also happens only on unload klippy.log d

gurusonwheels commented 2 years ago

still here also voron 2.4 skr 1.4 turbo dual boards . easy brd dragon HF

tonylee1918 commented 2 years ago

Quick update to further nail down issue - this happens to me only if the ERCF goes into "PAUSE" If all is well, no crashes. Also I can rule out the presence of the klipper expander board as I have disabled it for a little while now and it still crashes

EtteGit commented 2 years ago

I've found one source of MCU Timer too close, namely when calling _ERCF_SERVO_UP and _ERCF_SERVO_DOWN one after another from a python module command. I'll use this simple exemple to open a Klipper issue.

In the mean time I've pushed a fix for that yesterday

RealHabix commented 2 years ago

Still getting it even after last commit (self.toolhead.wait_moves()) - only on unloading.

rwijnhov commented 2 years ago

Edit: I changed to a different pi 3 exact same hardware version + different PSU and the problem went away.

I am having the same issue while trying _ECRF_CAlIBRATE it seems to happen after de ECRF_PAUSE get's called.

MCU 'ercf' shutdown: Timer too close 09:45:45 MCU 'ercf' shutdown: Timer too close This often indicates the host computer is overloaded. Check for other processes consuming excessive CPU time, high swap usage, disk errors, overheating, unstable voltage, or similar system problems on the host computer. Once the underlying issue is corrected, use the "FIRMWARE_RESTART" command to reset the firmware, reload the config, and restart the host software. Printer is shutdown 09:45:45 Too much slippage detected during the unload, requested = 1183.1, measured = 0.0 - calling _ERCF_PAUSE... 09:45:45 Correction unload move done, requested = 1183.1, measured = 0.0 09:45:45 Unload move done, requested = 1183.1, measured = 0.0

cmroche commented 2 years ago

I'm seeing this problem too, sometimes mid print and fairly reliably when calling ERCF_UNLOAD.

I've been investigating for the last couple of days, and noticed that the "timer too close" only happens for some combinations of microstepping and move lengths on the gear_stepper.

For example, this breaks for me (and all values of MOVE larger than this)

MANUAL_STEPPER STEPPER=gear_stepper ENABLE=1
MANUAL_STEPPER STEPPER=gear_stepper SET_POSITION=0
MANUAL_STEPPER STEPPER=gear_stepper MOVE=1200 SPEED=400 ACCEL=2000

But this will work (and all values of MOVE smaller than this)

MANUAL_STEPPER STEPPER=gear_stepper ENABLE=1
MANUAL_STEPPER STEPPER=gear_stepper SET_POSITION=0
MANUAL_STEPPER STEPPER=gear_stepper MOVE=1000 SPEED=400 ACCEL=2000

As a quick workaround, I set my microstepping to 8 and no longer have the error. With interpolation enabled on the TMC2209 there is little change in noise level fortunately.

I suspect the cause of the error now to be numerical imprecision in the step generation. But, it might also be possible that the seeduino is unable to construct and schedule the steps fast enough. Load analysis only shows about 35-40% MCU load, though there may be other factors such as CPU stalls hiding the problem.

cmroche commented 2 years ago

It dawned on me, the issue might be latency in solving the steps for the given hardware configuration. More microsteps and a long path means more solving to do. I hacked in a 1 second delay to the start of the step schedule and this resolved the problem, likely ruling out numerical inaccuracy and scheduling problems.

I did a quick test

240mm move
Total 0.078829635997, Append 2.79170053545e-05, Generate 0.0787814059877, Finalize 2.03130039154e-05

2400mm move
Total 0.520585677004, Append 2.79689993476e-05, Generate 0.520538333003, Finalize 1.93750020117e-05

Generation scales linearly with length, and that's a big jump to generate steps and 0.5 seconds delay will definitely trip the scheduler.

cmroche commented 2 years ago

I'm working on a proper fix in Klipper, should have something in a few days time: https://klipper.discourse.group/t/long-manual-stepper-moves-cause-timer-too-close/2740/3

RealHabix commented 2 years ago

@cmroche I changed the microstepping to 8 as well, but still got the error.

Does the commit (splitting long moves) fix the issue?

cmroche commented 2 years ago

@RealHabix If your timer too close happens on the long LOAD/UNLOAD move then it's possible the commit will fix your issue.

fastlightning commented 2 years ago

Any general solution for this problem?

cmroche commented 2 years ago

@fastlightning @EtteGit The simplest workaround for this problem would be to avoid singular long moves with MANUAL_STEPPER and instead break them up into smaller sequences of moves so that the motion planner doesn't stall the queue for too long. This implies ERCF_LOAD and ERCF_UNLOAD mainly, breaking the moves in half would probably fix this problem for the vast majority of users based on my testing.

The PR submitted to Klipper needs more work, while it gives a short term fix and probably you won't experience other problems while using it, there are some corner cases where it will not schedule moves correctly if you try to use SYNC=0, the subsequent moves will not start until the MANUAL_STEPPER move is complete.

EtteGit commented 2 years ago

The PR from @cmroche was just merged. Let me know how that went for you guys

gunan commented 2 years ago

I just merged all those changes, but I am still seeing this issue. Here are my logs: https://gist.github.com/gunan/f4cce4e7cb23577da37f670fe48ee576

gunan commented 2 years ago

Maybe #115 as this is happening during a load move? or load correction?

gunan commented 2 years ago

I tried reverting #113 to see if the issue gets better, but there are no improvements. I will try reverting another commit.

One thing I will see if it is related though is, I think before every crash, I see that "Unable to reach the toolhead sensor" message is printed. So, maybe the issue is around 701d38a62d82401237b910eb9194e5e10c37e446. Trying reverting that now.

gunan commented 2 years ago

Not that commit either, will keep looking.

gunan commented 2 years ago

I was able to root cause my problem as the m400 calls added py #102 when I remove those, My timer too close issues go away. Other problems arise, but at least I can pause, debug and resume my prints.

cmroche commented 2 years ago

I was able to root cause my problem as the m400 calls added py #102 when I remove those, My timer too close issues go away. Other problems arise, but at least I can pause, debug and resume my prints.

The M400 is not likely the root cause I'm afraid, all it's doing is adding a delay which causes your RPI's timing to push it over the scheduling threshold by draining the current step queue.

In ercf_software.py there is now a variable variable_num_moves, start by increasing this to 2 to break up the long moves into multiple subsequent moves, this reduces the step planner delay causing the error.