MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.34k stars 19.26k forks source link

[BUG] Bugfix-2.0.x v020008 - Random Freezes on BTT E3 Mini V2.0 / Ender 3 V2 with DWIN stock LCD screen (+Hemera extruder) #22390

Closed RipperDrone closed 3 years ago

RipperDrone commented 3 years ago

Bug Description

This might be related to https://github.com/MarlinFirmware/Marlin/issues/21010 and previous freeze issues reported here:

I am getting freezes irrespective of

Freeze can happen anytime, during probing or during printing. Had it once oeven before start of axes movements!

Looks it's getting worse with leaving the printer on for longer, so was suspecting a temperature issue. However, fans are all running, and steppers feel warm but not superhot even when running at higher speeds.

Bug Timeline

Started after replacing BTT E3 Mini 2.0 for the stock Creality 4.2.2 MB

Expected behavior

Complete prints

Actual behavior

Random freezes

Steps to Reproduce

  1. Flash firmware containing the configs uploaded further down to printer
  2. Load xyz cube in Cura
  3. Print over USB OR copy to SD card on PC and run print job from SD card via printer menu
  4. wait and hope
  5. meet the freeze randomly

Version of Marlin Firmware

bugfix-2.0.x

Printer model

Creality Ender 3 V2

Electronics

BTT E3 Mini v2.0

Add-ons

Hemera extruder, BLTouch - not likely to be involved though

Your Slicer

Cura

Host Software

Cura

Additional information & file uploads

Marlin.zip

arminth commented 3 years ago

I also started to get random freezes and TMC driver errors on my skr1.4 turbo and BTT TFT 3.5 E3 display. I have switched both baudrates from 250000 to 115500. Then my print finished. Maybe, you can try this. I am on latest bugfix 2.0.x.

Cheers Armin

RipperDrone commented 3 years ago

@arminth thank you for this hint, but I'm afraid my serial comm is running at 115k baud already and still fails/hangs 😒

CRCinAU commented 3 years ago

Unless you're using the hardware UART, the baud rate means nothing. The USB controller of these boards runs at USB speeds - not any serial speed. That means it tops out at about 1.1Mbit.

When you're talking about freezes, are you talking about pauses in print, controller hangs, or something else?

I run a BTT E3 V2.0 and haven't seen that issue. Which screen are you guys using?

RipperDrone commented 3 years ago

@CRCinAU Using the Creality stock Ender 3V2 LCD screen which has a serial comm to the mainboard.

Freeze means fans continue to spin, but print stops, printer does not respond to inputs over jog dial wheel anymore... needs cold reboot.

CRCinAU commented 3 years ago

So this is using the custom cable to connect the stock DWIN based screen to the SKR board?

RipperDrone commented 3 years ago

Cable pins are re-mapped to correct pinout via breadboard jumper cable. Works beautifully. If printer didn't freeze, that is 😄

CRCinAU commented 3 years ago

That's fine - I'd add that to your original request that you're using the DWIN screen etc - as I know that this isn't a very well tested combination.

RipperDrone commented 3 years ago

Unless you're using the hardware UART, the baud rate means nothing. The USB controller of these boards runs at USB speeds - not any serial speed. That means it tops out at about 1.1Mbit.

SKR E3 2.0 IS using hardware UART afaik....

CRCinAU commented 3 years ago

The USB port is not, the screen connector is a hardware UART.

https://github.com/bigtreetech/BIGTREETECH-SKR-mini-E3/blob/master/hardware/BTT%20SKR%20MINI%20E3%20V2.0/Hardware/BTT%20SKR%20MINI%20E3%20V2.0SCHpdf.PDF

RipperDrone commented 3 years ago

Ruling out a hardware issue next, ordered a replacement mainboard to test. BTT support suspects the display cable, maybe these ribbon cables catch a lot of noise. The jumper cable extension with an extra 200mm length needed to be routed inside the mainboard compartment, maybe this is part of the issue on the hardware side...

thisiskeithb commented 3 years ago

Does your SKR Mini E3 V2 have an STM32F103RCT6 or RET6 processor? Which environment are you compiling with?

RipperDrone commented 3 years ago

103RC, thus env used is STM32F103RC_btt

thisiskeithb commented 3 years ago

The configs you attached are from 2.0.8 and do not build with the latest bugfix. Can you attach the latest version?

Also, please confirm that you flashed & tested with the latest bugfix so we're not chasing old bugs.

RipperDrone commented 3 years ago

No need to test software any further before I will have the final answer to hardware vs software flaw. Since freezes occurred irrespective of LCD display connected or not, USB cable to PC connected or not, more frequent as printer gets warmer etc. I am suspecting defective hardware for now and will try a replacent mainboard next. Test has been with 2.0.8 bugfix so far, next I can test with latest version - as soon as I will have the new mainboard in my hands (pbly next week)

blazewicz commented 3 years ago

I'll just add here that I use the same combo i.e. Ender 3 V2, DWIN Display, BTT SKR Mini E3 V2.0 and I haven't seen such issue. My board has GD32F103RET6 MCU. I use STM32F103RE_btt_USB env.

I have other issues though:

This combo doesn't seem that stable after all.

CRCinAU commented 3 years ago

@blazewicz I seem to remember a while back, there was issues with having both USB data and serial multiplexed over the one link.

I would be very interested in testing and see if you see the same problems with the STM32F103RE_btt env.

marcwingduck commented 3 years ago

I am experiencing similar issues during probing and printing using a similar setup (Ender 3 V2 with stock LCD, BTT SKR mini E3 V2.0 with RC MCU). Though in my case the board does not freeze, it soft resets. The fans turn off, the progress bar appears on the screen and after it has disappeared I can control the printer again. In the serial output the same notifications appear as if I would have switched the printer on normally. There are no error messages.

I am using target STM32F103RC_btt according to my MCU. Next to the latest bugfix_2.0.x (02000901), I also compiled and tested the official BTT fork (02000801). I have also tried both versions of the firmware with disconnected screen, initializing prints from the SD card via serial connection. Also, I am already using the second board, a different SD card, and unfortunately, the issue persists.

With the second board not working, I am running out of ideas on how to further isolate the problem. So if anyone has any other ideas, i am happy to try them out

RipperDrone commented 3 years ago

Hmm, has anyone tried to supply 24V DC to the mainboard from an external power source, just to check if we can rule out a PSU issue / voltage drop?

RipperDrone commented 3 years ago

Alternatively, I start worrying if silent fan mods (including replacing the metal mainboard and PSU casings by 3D printed plastic ones) might have a detrimental EMI effect on the circuit boards, picking up too much EMI noise from e.g. PWM operated fans or heaters?

CRCinAU commented 3 years ago

I can't say as to causes, but I run a BTT E3 Mini v2.0 board - but with the stock text screen as well as the TFT35. I haven't seen anything like the problems stated.

RipperDrone commented 3 years ago

@CRCinAU So it does work if done correctly and hardware is ok, it seems 👍. Can you pls post some details about your config?

CRCinAU commented 3 years ago

I use the latest bugfix-2.0.x - the STM32F103RC_btt env, everything else is mostly defaults.

There is a ton of changes to my hardware, but that's all stuff like thermistor types and enabling PID etc along with MIN_X_POS etc etc - which I wouldn't think would make any difference on this at all...

RipperDrone commented 3 years ago

Got it. Your 'stock text screen' refers to (stock Ender3V2) Creality DWIN screen or to an Ender 3 stock screen (non DWIN). Your printer is an Ender 3 (non-V2) then?

CRCinAU commented 3 years ago

Yeah - the Ender 3 Pro screen - I forget its definition at the moment....

RipperDrone commented 3 years ago

@marcwingduck The more users post their setups, the more I'm getting the impression that the Ender 3 He's Creality DWIN LCD display is messing things up (serial ports, USB communication, timer/IRQ racing, whatever it is).

To narrow things further down, could you test latest bugfix Marlin version compiled with all settings like in your current config but without setting the serial port up for the DWIN display (serial2 left at default instead of setting it to 1 for DWIN display, plus comment out CREALITY_DWIN_LCD) and serving g-code print commands over USB? Just unplugging the display cable might still leave the serial communication faulty...

I will pbly get my replacement E3 Mini within next 2-3 days, can then re-engage into testing as well...

marcwingduck commented 3 years ago

@RipperDrone The BTT firmware I used had the SERIAL_PORT_2 set to -1, which is default I assume. But I can check if the issue persists with the latest bugfix-2.0.x on Sunday or Monday!

A little more info about my setup:

I also have the silent mod, and thus printed main board and PSU covers using Noctua fans. The voltage regulator I used is a MP1584EN. I am using a Hydra direct drive mod, also with a Noctua fan and two Winsinn blower fans instead of the original one. From day one I had issues with the LCD, where the rotary encoder (or some interferences along the wires) issued ghost enter presses during prints. This problem disappeared at a completely random point.

Further observations regarding the issue:

Since a soft reset screams for undervoltage, I thought it may occur when the bed or nozzle start heating, or even when the blower fans kick in, but as you already mentioned, the stops seem completely random. What speaks against randomness though: Using the BTT firmware, the printer stopped two out of three times at the same point in a print (the third time it stopped just 10 mm before that point). I am going to test this print again using the latest bugfix-2.0.x within the next days.

RipperDrone commented 3 years ago

@marcwingduck thank you, any detail may help us to troubleshoot.

My fan setup is 24V 80mm Sunon / 12V Noctua on mainboard/PSU, thus no stepdown used (at least we can exclude stepdown noise as a contributor to the issues 😄).

Issues are completely random in occurrence here, xyz cube stops at 3 different points of progress using same setup and g-code. Only persistent trend seems to be 'the longer the printer is running, the earlier the problem occurs', pointing towards thermal issue / bad components or soldering on PCB. This I will verify with replacement board from another vendor than 1st time purchase, to make it more likely it's not from same batch / production run as first one.

I NEVER observed ghost press button events on encoder wheel commands, it always stops printing, fans keep running, non reactive wheel/button anymore. Sometimes after letting it sit there for a certain watchdog/idle time, it soft reboots (display progress bar ramping up again, fans keep running), sometimes it stays frozen forever.

When doing the re-coding of display pins to make the Ender 3V2 stock DWIN LCD work, I had to stuff some 200mm of ribbon cable excessive length into the mainboard slot compartment on the right side looking from underneath, however I have sticky taped all pins/sockets on both ends securely. Maybe it's still prone to EMI noise - I will try to use a shorter extension next time and route the cable differently, not folding it into loops anymore.

What still makes me nervous is that there are so many HAL library changes going on in Marlin HAL (maple deprecated, timers/IRQs ha doing still somewhat suspicious in STM32 libs, one workaround for GD microcontroller needing a dedicated single buffer USB setting instead of the stock double buffer to work (@looxonline , @blazewicz etc.)) that I don't know how far to trust the framework / HAL / firmware at the current point, particularly as it comes to testing latest bugfix-2.0.x versions... 🤔

looxonline commented 3 years ago

@RipperDrone The BTT firmware I used had the SERIAL_PORT_2 set to -1, which is default I assume. But I can check if the issue persists with the latest bugfix-2.0.x on Sunday or Monday!

A little more info about my setup:

I also have the silent mod, and thus printed main board and PSU covers using Noctua fans. The voltage regulator I used is a MP1584EN. I am using a Hydra direct drive mod, also with a Noctua fan and two Winsinn blower fans instead of the original one. From day one I had issues with the LCD, where the rotary encoder (or some interferences along the wires) issued ghost enter presses during prints. This problem disappeared at a completely random point.

Further observations regarding the issue:

Since a soft reset screams for undervoltage, I thought it may occur when the bed or nozzle start heating, or even when the blower fans kick in, but as you already mentioned, the stops seem completely random. What speaks against randomness though: Using the BTT firmware, the printer stopped two out of three times at the same point in a print (the third time it stopped just 10 mm before that point). I am going to test this print again using the latest bugfix-2.0.x within the next days.

A soft reset is often caused by a timeout when reading from the SD card. Try starting a print and then manually removing the SD card and you should be able to replicate it fairly easily. With this in mind you may want to try a variety of different cards and also be sure to format them using the official SD card formatting tool (google that phrase and it should be the first link). Since your board also had ghost presses that disappeared without a reason I would say that you may have an EMI issue. Check that your machine has a good Earth connection and check that you are Earthing the chassis. Also check that your wires to and from the PSU are well secured.

There are HAL changes going on at the moment and more than that there is a major shift from the maple framework to the STM core framework but I have been running dev builds which have been working great regardless of the changes (aside from the lingering USB issues with the GD variant).

blazewicz commented 3 years ago

@blazewicz I seem to remember a while back, there was issues with having both USB data and serial multiplexed over the one link.

I would be very interested in testing and see if you see the same problems with the STM32F103RE_btt env.

@CRCinAU I changed my env to STM32F103RE_btt (with Arduino fix for USB single buffer) and it seems to be stable now - no issues with ~5h print, I'll test it on something longer tonight.

The BTT firmware I used had the SERIAL_PORT_2 set to -1, which is default I assume. But I can check if the issue persists with the latest bugfix-2.0.x on Sunday or Monday!

@marcwingduck SERIAL_PORT_2 won't work with DWIN display due to #22299. You need to set SERIAL_PORT to -1 and leave SERIAL_PORT_2 undefined.

RipperDrone commented 3 years ago

Alrighty, I'm back in the game. Just installed my replacement BTT E3 Mini 2.0 board, flashed actual bugfix Marlin version with Ender 3V2 configs from examples directory, configs adopted for SKK board acc. changes by @blazewicz: https://github.com/MarlinFirmware/Configurations/issues/535.

(Then manually edited for some Hemera extruder magic which is not the point here)

Mainboard-to-display cable pins re-mapped to match CREALITY_DWIN_LCD display - display works ok. My replacement board has the STM32F103RC chip affirmatively - so says the chip ink print at least ;-).

env = STM32F103RC_btt_USB: Did compile with a minor warning (auto assignment not suitable in context) env = STM32F103RC_btt: Same

This is what it throws up: Compiling .pio\build\STM32F103RC_btt_USB\src\src\libs\stopwatch.cpp.o Marlin\src\lcd\dwin\e3v2\dwin.cpp:497:71: warning: use of 'auto' in parameter declaration only available with '-fconcepts' 497 | inline bool Apply_Encoder(const ENCODER_DiffState &encoder_diffState, auto &valref) { | ^~~~

env = *_USB: Firmware seems fully functional, bedsize and offset needed some trimming. Have to get into testing some prints next... :-)

Is there anything I should consider in the current state of Marlin (I saw many fragments of Jyers enhancements are being implemented right now, remixed with 2 more devs' DWIN enhancements, as it seems - confusing for me as a coding noob to guess whether or not in the current transition phase there is a stable enough Marlin bugfix version available which is worthwhile stress testing for USB / serial freezes due to IRQ racing or other timing issues in software OR hardware issues with the BTT mainboard we had observed. As things stand right now, there seems to be some of you (particularly @blazewicz ) who are getting more and more happy with the Ender 3V2 + DWIN LCD + SKR E3 MINI board - maybe we have overcome the teethign issues now!? :-)

RipperDrone commented 3 years ago

Configurations.zip

Here's my config files, just for ref :-)

RipperDrone commented 3 years ago

The configs you attached are from 2.0.8 and do not build with the latest bugfix. Can you attach the latest version?

Also, please confirm that you flashed & tested with the latest bugfix so we're not chasing old bugs.

@thisiskeithb catching up now - so answer to both your questions is YES now:

will have to find time and test longer prints now - very short tests have successfully finished. If it stays like this, it must have been a faulty mainboard - the replacement board seems to be good so far.

blazewicz commented 3 years ago

My printer just completed a 8.5h print from Octopi. In total I did over 20h over USB on this setup without any issues.

Once again, my setup:

RipperDrone commented 3 years ago

Have done ~20 prints now, never seen a single freeze anymore. Seems fixed with replacement mainboard - latest Marlin bugfix running stable, staying with STM*_btt environment for now. :-)

Thank you for all the good hints here! Closing this issue now...

github-actions[bot] commented 3 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.