[BUG] Firmware resets while homing

thespooniest commented 2 years ago

Did you test the latest `bugfix-2.1.x` code?

Yes, and the problem still exists.

Bug Description

The firmware resets early in the homing process. If the axes started near their zero point, the firmware actually acts as though the switches weren't there, continuing to grind into the zero point until the firmware resets. But if the axes start far from their zero points, the firmware will reset anyway, a few seconds after they begin to move, even if they haven't hit their endstops yet.

Bug Timeline

This started on 10/24/2022, not long after Input Shaping merged in.

Expected behavior

I expected the homing process to complete normally.

Actual behavior

If the axes start close to the zero point, they will grind against the endstops for a second or two before the firmware resets. If the axes start far from the zero point, the firmware still resets a second or two after movement begins, even if they have not reached the zero point.

Steps to Reproduce

Compile Marlin with Input Shaping enabled.
Upload the firmware and reset the printer.
From the main menu, choose Motion->Homing->Auto Home.
Observe the results.

Version of Marlin Firmware

bugfix-2.1.x, 2022-10-24

Printer model

Creality Ender 3 Pro

Electronics

BigTreeTech SKR Mini e3 v3

Add-ons

BLTouch 3.1

Bed Leveling

UBL Bilinear mesh

Your Slicer

Cura

Host Software

OctoPrint

Don't forget to include

[X] A ZIP file containing your Configuration.h and Configuration_adv.h.

Additional information & file uploads

Marlin.zip

thisiskeithb commented 2 years ago

Using the same firmware build, does recompiling with Input Shaping disabled fix the issue?

thespooniest commented 2 years ago

Yes, it does.

On Tue, Oct 25, 2022, 1:45 AM Keith Bennett @.***> wrote:

Using the same firmware build, does recompiling with Input Shaping disabled fix the issue?

— Reply to this email directly, view it on GitHub https://github.com/MarlinFirmware/Marlin/issues/24927#issuecomment-1290017733, or unsubscribe https://github.com/notifications/unsubscribe-auth/AANCHFOL2A5F4AOHP3TAWZDWE5XYRANCNFSM6AAAAAARNTF3XY . You are receiving this because you authored the thread.Message ID: @.***>

Foxies-CSTL commented 2 years ago

Hi, I had this problem and it was related to the lack of RAM. The end of compilation showed me 97% usage and it was not enough. Now it's up to 85%. Maybe a way forward and look carefully at the conditions regarding the input shaping #24797.

ellensp commented 2 years ago

Already checked that from provided configs

Checking size .pio/build/STM32G0B1RE_btt/firmware.elf
Advanced Memory Usage is available via "PlatformIO Home > Project Inspect"
RAM:   [==        ]  15.2% (used 22420 bytes from 147456 bytes)
Flash: [=====     ]  52.6% (used 275544 bytes from 524288 bytes)

cbagwell commented 2 years ago

I have same board but also have about 10 hours of prints with input shaping enabled and no resets seen. I do see the homing grinding into the edge when it starts close to edge or if I abort a print using menu but I see that with input shaping not compiled in as well and its not 100% reproducible.

The main difference I saw in configs is I do not have UBL enabled so it might help to isolate if you test with bilinear enabled.

Also, just an FYI. I noticed you had a typo in BLTOUCH_HS_MODE define so not sure how that is being treated during compile.

tombrazier commented 2 years ago

I do see the homing grinding into the edge when it starts close to edge or if I abort a print using menu but I see that with input shaping not compiled in as well and its not 100% reproducible.

@cbagwell and @thespooniest are you both using sensorless homing?

@thespooniest Is homing speed the same with and without IS enabled? And when the firmware resets, do you get a message saying "homing failed"?

@thespooniest What default_env do you have in platform.ini? And once we have the answer, @cbagwell do you have the same?

thespooniest commented 2 years ago

I am not using sensorless homing. As part of this process I did discover that I'd forgotten to remove the DIAG jumper on my mainboard (as you're supposed to do with the SKR Mini e3 v3), but removing it does not seem to have fixed the problem: the motor still grinds if homing starts too close to zero, and the board still resets after a second or two either way.

Homing speed seems to be the same with and without IS enabled, at least on X and Y. I don't know about Z, because the firmware resets before reaching that part. I don't get any message, either onscreen or through OctoPrint: the firmware just resets.

My default_env is default_envs = STM32G0B1RE_btt

cbagwell commented 2 years ago

I'm not using sensorless homing as well and also default_envs = STM32G0B1RE_btt.

My wild guess is if I disabled QUICK_HOME then it might resolve the homing issue (but I haven't taken time to test that idea yet). That's because It moves diagonal until it hits the X end stop then Y movement stops while X starts grinding. I haven't reviewed the code yet but I assume that means it has entered some state where its should be backing back up but is confused and maybe its that concurrent Y movement that is causing the confusion.

I see the reference BTT Mini E3 V3 config has QUICK_HOME commented out but I carried it over from my CrealityV422 config file. Maybe its commented out for a reason?

In my case, I eventually get a homing failed message but I do not see a reset and have to power cycle manually to recover.

But mine of course is just an inconvenience since an average home does work and I can use IS.

thespooniest commented 2 years ago

I tried disabling QUICK_HOME. It didn't solve the reset issue.

The motor grinding changes, but isn't fixed. If I start far from the zero point, the board resets a second or so after the homing starts, even though it hasn't hit the zero point, just like before. If I start a medium distance from the zero point, the head eventually hits zero and begins to grind, but the board resets a second or two after homing starts, again like before.

But if the head starts very close to zero, it will grind until it generates a "Homing Failed - Please Reset" error. This halts the printer, which prevents the automatic reset that the other situations have.

thespooniest commented 2 years ago

Possible clue. As I attempted to change the nozzle, I told the printer to move the extruder -10mm. This also caused a spontaneous reset. Perhaps it has something to do with input shaping while moving an axis into the negatives?

(And I also broke my thermistor, so I may not be able to do tests for myself for a couple days. New one arrives Thursday ).

tombrazier commented 2 years ago

I have a SKR Mini v3 coming my way, so will try to replicate.

tombrazier commented 2 years ago

But for now @cbagwell can you build with @thespooniest's exact config and see if it replicates for you?

cbagwell commented 2 years ago

I was able to reproduce the reboot on first try with attached config files. Still the biggest difference between my config is UBL is enabled so that would be my next guess to toggle off.

I'll try to do more testing in a few hours or at least upload a diff of my working config.

cbagwell commented 2 years ago

My printer ended up being used for prints so I wasn't able to debug... so I'll attach diff between my no-reset config and the reset config. In addition to UBL, here are some items that jumped out at me that are X/Y related:

SKEW_CORRECTION
ADAPTIVE_STEP_SMOOTHING (brought up a lot in HOMING failure discussions on this board)

minie3v3.zip

tombrazier commented 2 years ago

Okay, here's a hypothesis: ADAPTIVE_STEP_SMOOTHING causes the stepper ISR to be called so often that when you add extra ISR calls for input shaping, the CPU spends all its time in ISRs. Consequently endstop logic is not processed and the watchdog timer never gets reset, resulting in a reboot.

What happens if you disable USE_WATCHDOG in Configuration_adv.h? (Just temporarily for this test - be sure to switch it back on later!) If I am right you'll still get the grinding but not the reboot.

cbagwell commented 2 years ago

It will probably be tomorrow before I have access to my printer to test but I wanted to add some support to the hypothesis.

ADAPTIVE_STEP_SMOOTHING was recently disabled in config files for this specific board because that appeared to solve some cases with BLTOUCH failing. Others have found another solution to allow ADAPTIVE_STEP_SMOOTHING with BLTOUCH by increase interrupt priority of PWM timer (#24922 ) while others needed to disable both ADAPTIVE_STEP_SMOOTHING and ENDSTOP_INTERRUPTS_FEATURE (mentioned in #24351 ) for BLTOUCH stability.

The common theme in those solutions are to reduce or rearrange interrupts to solve bltouch issues so the board is likely already struggling with the interrupts it has and adding INPUT_SHAPING on top of ADAPTIVE_STEP_SMOOTH may be pushing it into watchdog territory.

I bought a MAX7219 at same time as this board specifically to use with your profiling code with it but it's currently collecting dust in a corner. That would likely be useful info in this case so I'll try to make that my next project.

tombrazier commented 2 years ago

Oh a MAX7219 would really help. I am glad you have one.

cbagwell commented 2 years ago

I was able to run some more tests and you where right.

I commented out watchdog and homed X. The board no longer rebooted but did grind when it got to home.
I reenabled watchdog and commented out ADAPTIVE_STEP_SMOOTH. The board didn't reboot and correctly homed.
I reenabled both items and commented ENDSTOP_INTERRUPTS_FEATURE and board still rebooted.

So now its clear that enabling both ADAPTIVE_STEP_SMOOTH and INPUT_SHAPING causes the board to spend to much time in interrupts.

I think I may have also gotten some clues that my home grinding even when I have both ADAPTIVE_STEP_SMOOTH and INPUT_SHAPING disabled could be from lost interrupts. Perhaps there is a burst of interrupts right when you initiate a home sequence and could explain why it often has issues when you start the home really close to zero.

tombrazier commented 2 years ago

Excellent. Now with both INPUT_SHAPING and ADAPTIVE_STEP_SMOOTING, is the problem resolved by adding

TERN_(INPUT_SHAPING, max_rate <<= 1);

before https://github.com/MarlinFirmware/Marlin/blob/6b4d7b9151d260d3109ca857d084e3ca0cd8b048/Marlin/src/module/stepper.cpp#L2468

cbagwell commented 2 years ago

That did indeed allow me to home X/Y without rebooting! Sample size of 1 though.

tombrazier commented 2 years ago

It halves the ASS ISR rate when IS is enabled. I think we may have solved this one.

thespooniest commented 2 years ago

No sooner did I replace the thermistor than I broke the cable for my BLTouch. But I was able to confirm that the motors don't grind and the firmware doesn't reset, so I think this is fixed.

tombrazier commented 2 years ago

I'm chasing down some other things at present but in due course I'll get this or a similar fix into my is_improvements branch.

Thanks for reporting it and helping find the bug.

tombrazier commented 2 years ago

I think there is probably another bug related to ADAPTIVE_STEP_SMOOTHING: the replay buffer is too small if ASS actually engages (and that depends on your mainbard speed) when printing or moving at high speed. This will result in some crude speed limiting kicking in in the stepper module which will slow your print and if you also have LINEAR_ADVANCE enabled, it will not know about the speed change and will overextrude.

cbagwell commented 2 years ago

I was browsing around stepper.cpp looking for other things that might be contributing to ISR running to much.

I wonder how accurate ISR_S_CURVE_CYCLES needs to be?

The STM32G0 is a cortex-m0 and missing the umull instruction so it's falling back to the unoptimized logic. ISR_S_CURVE_CYCLES is hard coded to the 40-ish cycles of the optimized version though and I think this board is running closer to 200 cycles.

I looked at the assembly generated and the 3 64-bit multiplications are causing _aeabi_lmul() to be called and that function is 40 instructions. I've seen versions of that function done in 25 instructions so probably room for long term improvement.

Short term I'm wondering if its worth updating the define to use a higher value for this board?

tombrazier commented 2 years ago

I think that makes sense (as a separate PR!). Make it depend on the same condition, i.e. the opposite of #if (defined(__arm__) || defined(__thumb__)) && __ARM_ARCH >= 6 && !defined(STM32G0B1xx)

cbagwell commented 2 years ago

I retested with is_improvements branch and INPUT_SHAPING+ADAPTIVE_STEP_SMOOTHING+S_CURVE_ACCELERATION and unfortunately homing still was not working. It was still grinding into end stop and appears to reboot from watchdog after a couple of seconds.

I even tested with and without my update to scurves cycles from 40 to 500; hoping that would back off the CPU usage enough; but no difference.

I did finally hook up my MAX7219 and used it during this time. It appears that the instant the homing started that the display also stopped being updated (7 out of 8 LED's light on profiling and no LED's changed and the alive led stopped blinking).

Once I reinstall without ADAPTIVE_STEP_SMOOTHING then things go back to normal. It stays very idle during homing (7 out of 8) but I get hints of some kind of bursty operations going on (at very end of Z home, I get a blip down to only 1 led for example).

cbagwell commented 2 years ago

In case it helps, here is value of ISR_EXECUTION_CYCLES for these boards (with my +500 cycles for SCURVE instead of +40). I'm still trying to understand the usage of the CPU frequency divided by some fixed # since it will result in a smaller cycle count the slower your CPU. I'd think the value would be used to multiple against some fixed cycle count and not simply added in.

Marlin/src/feature/../module/stepper.h:242:1: note: #pragma message: ISR_EXECUTION_CYCLES: (((792UL + 500UL + (((4UL + _MAX(_MAX(uint32_t((64000000L) / (5000000)), ((64000000L) / 500000UL) * (1UL)), (0UL + 16UL + 16UL + 16UL + 16UL))) * (1 + 1 + 1))) * (1) + 64UL + _MAX(_MAX(uint32_t((64000000L) / (5000000)), ((64000000L) / 500000UL) * (1UL)), 16UL))) / (1))

tombrazier commented 2 years ago

I now have an SKR Mini V3 so I can play around with this myself. How did you connect your MAX7219, i.e. which connector and what config?

cbagwell commented 2 years ago

There is a set of pins labelled SPI1 you can get 5V and GND from but the remaining pins are also used by SD card so you can't use them freely.

https://github.com/bigtreetech/BIGTREETECH-SKR-mini-E3/blob/master/hardware/BTT%20SKR%20MINI%20E3%20V3.0/Hardware/BTT%20E3%20SKR%20MINI%20V3.0_PIN.pdf

There is also a row of 5 pins labeled I/O that are free to use and I used the 3 closest to SPI1 power pins.

#define MAX7219_CLK_PIN   PD3
#define MAX7219_DIN_PIN   PD5
#define MAX7219_LOAD_PIN  PD4

tombrazier commented 2 years ago

Okay, I have Marlin on my BTT SKR Mini E3 V3.0 with INPUT_SHAPING, ADAPTIVE_STEP_SMOOTHING and S_CURVE_ACCELERATION and it runs fine with G28. My MAX7219 is connected and shows that the processor is fairly quiet during homing.

What processor does your board have? Mine has the STM32G0B1RET6.

I do not have a UI. I wonder whether that might make a difference.

Could you post your config?

cbagwell commented 2 years ago

Interesting. Original reporting and I both have Ender 3 Pro's so using that style UI.

The original poster posted a config zip at top and here is link to repo I keep my config in as well. My default config has ADAPTIVE_STEP_SMOOTHING disabled so please enable if you start from that; but it does also have MAX7219 enabled with above pins.

https://github.com/cbagwell/Marlin

tombrazier commented 2 years ago

One other thing I didn't have compiled was a probe. So I plugged in my new CRTouch and it burnt! Damned Z probe connector on the BTT was backwards. So dispiriting.

cbagwell commented 2 years ago

Frustrating! I would be happy to donate a new one as thanks for all the new features.

tombrazier commented 2 years ago

So it turns out the probe was not related to the problem anyway. But the display is, sort of. When I disable CR10_STOCKDISPLAY homing no longer triggers the watchdog timer. The stepper ISR is taking pretty much all the processor time and just updating the display pushes it over the edge.

I have also established that both your changes and mine together change MIN_STEP_ISR_FREQUENCY from 27681 to 17021 which is less than the factor of 2 that is needed to give the CPU spare cycles. Even when I force ASS to run at half the frequency, by moving CodeProfiler from idle() MarlinCore.cpp to Stepper::isr() I can see that too much CPU is being used by the ISR.

So the ASS calculation is far too inaccurate for the STM32G0B1RE. I wonder how much Stepper::calc_timer_interval() contributes to this. It calls __aeabi_uidiv(), I think, from looking at the assembly whereas with a STM32F103RE, there's just a udiv.

I would be happy to donate a new one as thanks for all the new features.

That's very kind, Chris, but it was already a spare which was donated to me and I was happy to have it on that basis for testing purposes. My actual printer uses a different sensor so I don't really need it for myself. It's just frustrating to have a perfectly good piece of hardware destroyed by one incompetent component placement in another piece of hardware.

tombrazier commented 2 years ago

@cbagwell I have measured __aeabi_uidiv() at about 136 cpu cycles vs udiv which takes up to 12 (I think from a brief google). So can you add about 120 for this to #24955?

Also, I measure the call to the SCURVE logic at 544 cycles. So that's another 44.

I am still measuring but there is still quite a lot more time taken in the ISR than is being counted.

cbagwell commented 2 years ago

Will do.

__aeabi_uidiv() is showing up as __udivsi3 in objdump. Let me stare at the code calling that for a bit and find a good place to account for those extra cycles.

I also see some calls to __aeabi_ldivmod in IS paths and those look like they could be at least as expensive as __aeabi_uidiv().

tombrazier commented 2 years ago

I also see some calls to __aeabi_ldivmod in IS paths and those look like they could be at least as expensive as __aeabi_uidiv().

I suspect that will be

        TERN_(HAS_SHAPING_Y, advance_dividend.y = ((steps << 29) + shaping_y.remainder) / step_event_count);

and the same for X. I have a change in the pipeline that will eliminate this code. Just need to resolve the SKR Mini and TMC2208 issues first.

PS these divisions only happens once per segment so I am not overly concerned about them.

tombrazier commented 2 years ago

Hmm, pulse_phase_isr() takes quite a lot more time than is planned for. ISR_STEPPER_CYCLES is 16 but I measure about 90 cycles per axis.

I suspect DelayQueue::peek_x() and DelayQueue::peek_y() are also quite expensive.

SCURVE, on the other hand, has a much lesser effect because it only occurs during accel and decl ramps.

And it turns out using a float as a boolean is really expensive.

I'll fix some of these and see where we end up.

cbagwell commented 2 years ago

Hopefully, this patch will one day land in gcc to be better optimized for cortex-m0: https://www.mail-archive.com/gcc-patches@gcc.gnu.org/msg294320.html

We are just checking off each of those functions one-by-one in this ISR. It's still a long, long way off until that would be visible in platformio's gcc though.

It, for example, optimized __aeabi_lmul from 40-ish cycles down to 30 cycles. __aeabi_uidiv its still between 72 and 385 cycles so may not be any better than today.

tombrazier commented 2 years ago

@cbagwell have a look at https://github.com/tombrazier/Marlin/tree/scurve_stm32g which prompts the compiler to use MULS where possible in the SCURVE fallback function rather than calling __aeabi_lmul. I think it's right but it needs testing.

cbagwell commented 2 years ago

Clever. I was concentrating on how to optimize the original assembly and how small I could make a 32x32=64. I didn't consider optimizing the C path and taking advantage of the 32bit shift. Probably by using assembly, we can also use STM32G0's 32x32=32 instruction and feed it the 16bit values to reduce a bit more.

I'll give it a test.

cbagwell commented 2 years ago

I did not have as much free time as I hoped today so I tested it as-is. I rebased my config branch to latest bugfix-2.1.x then added is_improvements+scruve_stm32g and my scurve-cycles-cortexm0 branch. I also enabled ADAPTIVE_STEP_SMOOTHING+S_CURVE_ACCELERATION+INPUT_SHAPING.

The good news is that I could successfully home with no reboots so I think we are on the right track with increased cycle counts and improving scurve speeds. I notice the MAX7219 has what looks like a 1.5 second freeze right when I initiate homing and also again after X/Y is done/moves to center of bed and is getting ready to lower the probe. Might be something to investigate deeper later on as the freeze might explain some probe failures.

I also successfully printed a hollow calibration cube I use for flow tests. It prints slooow because of Cura defaulting to nothing faster than 10 seconds a layer.

I then printed the ringing print from input shaping pull request (I did all the stuff to get full speed). This is the bad news. That print had some issues. The print would freeze every once in a while for like 60 seconds and then just resume on its own for unknown reasons. I also had severe under extrusion.

I need to do some troubleshooting tomorrow to narrow down the cause since this wasn't a small chance since my last successful print.

cbagwell commented 2 years ago

I reverted the scurve_stm32g change and the occasional 60 second pauses went away.

I still have under extrusion on my ringing tower at 90mm/s but I'm now thinking that might be related to my extruder stepper motor going bad earlier this month and I replaced it with a backup stepper+extruder that I've never been happy with and likely needs more tuning.

I'll switch back to testing at normal speeds until I get that worked out.

cbagwell commented 2 years ago

Shouldn't this:

         #define MUL_F_T do { \
            f = uint16_t(f >> 16) * uint16_t(t >> 16) + \
                ((uint16_t(f >> 16) * uint16_t(t)) >> 16) + \
                ((uint16_t(f) * uint16_t(t >> 16)) >> 16); \
          }while(0)

be this (high part shifted back up to high part)?

         #define MUL_F_T do { \
            f = ((uint16_t(f >> 16) * uint16_t(t >> 16) << 16) + \
                (uint16_t(f >> 16) * uint16_t(t) + \
                (uint16_t(f) * uint16_t(t >> 16); \
          }while(0)

tombrazier commented 2 years ago

I don't think so. You need 2 16 bits shifts to make up the 32 bit shift. I have not tested this logic to see that it gives the same answers. It should be almost exactly the same with an additional tiny rounding error.

cbagwell commented 2 years ago

I see where I went wrong. I was thinking about doing a 32bitx32bit=64 followed by >> 32 like this:

(((uint16_t(f >> 16) * (uint16_t(t >> 16) << 32) + \
(((uint16_t(f >> 16) * uint16_t(t) + \
 (uint16_t(f) * uint16_t(t >> 16)) << 16) + \
(uint16_t(f) * uint16(t))) >> 32;

and since we know we are going to do that shift why not skip the full 32-bit shift in first place. I somehow translated that to still doing a 16 bit shift. For what I was thinking to work, it would reduce to this:

(uint16_t(f >> 16) * (uint16_t(t >> 16)

In my quick gcc test on Intel, all 3 versions do not match original (unit64_t(f) * t) >> 32 as close as I'd assumed they would. So I'll play around a bit more to understand why.

tombrazier commented 2 years ago

My first version throws away the product uint16_t(f) * uint16_t(t) and saves one MULS. This introduces a small rounding error which is of the order of 2^-31 when the 32 bit number is converted back to the range 0 to 1.

cbagwell commented 2 years ago

I got a new stepper motor and now my under extrusions are gone. I'm also sure I did something wrong that caused the long pauses so I started fresh with latest is_improvements+scurve_stm32g branches and my scurve-cycles-cortexm0 branch. I also enabled ADAPTIVE_STEP_SMOOTHING+S_CURVE_ACCELERATION+INPUT_SHAPING.

I do not normally have ASS enabled but this was to stress the board while doing IS ringing tower tests. This test run was the best looking and most stable print of a ringing tower to date on this board.

This board has some homing issues that pre-date IS being merged in (it would sometimes grind into an end stop and that was with ASS disabled). With current is_improvements, I am now unable to reproduce those grinding issues and it used to be pretty easy to reproduce.

And finally, I reviewed the assembly from scurve_stm32g and it dropped from 500 cycles down to 257 cycles. Nice reduction. There are 3 64-bit multiplications left and they are accounting for 170 of those remaining cycles.

cbagwell commented 2 years ago

This was fixed by #24951 .

MarlinFirmware / Marlin