MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.17k stars 19.21k forks source link

[BUG] Bigtreetech SKR 1.3 (and others 1768 chips) + Delta Options + Any BIG option = Not boot #14006

Closed neox3 closed 4 years ago

neox3 commented 5 years ago

Today i finded this terrible bug, that makes my machine inoperative. For some strange reason, if you add to the firmware the config options of a delta, and enable big things that costs a lot of memory like autolevel , bed pid... the machine hangs and the SD reader and USB conection results inoperative...

Steps to Reproduce

  1. Copy the config files from the directory examples of delta mini kossel
  2. Modify it to add the board big tree SKR
  3. Add some basic options and compile. Try that, the machine responds to connections. ALL OK
  4. Add the option BED PID (enable) and compile. ERROR

Expected behavior: [What you expect to happen] Machine is OK.

Actual behavior: [What actually happens] Machine inoperative. The USB conecction is broken and the internal SD disappears from the file explorer in windows.

Additional Information

Tryed with another board with the LPC1768 chip, the RE-ARM and the result is the same, for that, the problem is in the firmware, not in the hardware.

I include two config files that shows the point when, if you enable or disaple BED PID, the board turns into OK or FAIL.

files.zip

VanessaE commented 5 years ago

Have not been able to test anything for the last couple of days, sorry, sometimes real life gets in the way.

No worries.

However in the case of your V1.3 board it looks like you have configured TMC2208 devices, but do not have them plugged in. Is that correct?

Technically, yes, that's right. But... while I was using the v1.3 for the PIDTEMPBEDcheck, and I do have 2208's configured-in (since that board has a full, standard configuration that I would have otherwise used in production), I only took the hardware as far as connecting the thermistors to make sure they worked (i.e. I did not install drivers nor connect anything else, not even 12v).

On the other hand, the full hardware setup was only done on the v1.1, hence where the bulk of my troubleshooting was aimed. Speed and reliability of the M122 output, as I mentioned in #14047, is good on the v1.1 with 2513f6b and TMCStepper v0.3.1 (I literally could not wish for a better behavior). On the v1.1 with 7b4c3bd and TMCStepper v0.3.4, it's ... bad.

My guess is that in your case the startup period (while it tries to talk to various non-existent devices) is pretty close to the watchdog timeout) and that adding the PIDTEMPBED option is pushing the startup time over the watchdog period and triggering a reboot.

I could see that being the case running 7b4c3bd / v0.3.4 given how slow the 2208 comms are there.

Note that I have not yet tried the 2513f6b / v0.3.1 configuration on the v1.3, but I know others have, or with a substantially similar firmware config, as that's the commit I recommend in my SKR UART Instructable, which a few people have reported using to get their v1.3's up and running. I didn't put in a specific TMCStepper version recommendation to go with that commit, though... until now. :wink:

When you try that configuration do you see first the Marlin USB device appear and then the smoothieboard one?

You mean when I first bootup the board? Er, that's in my logs from my earlier posts in this Issue, but ok. In the following, I powered up the printer (with my SKR v1.1 running 2513f6b / v0.3.1), waited for the SD card to mount and Thunar to pop up a window showing it, then unmounted the SD and shut the printer off:

log ``` [May22 14:55] usb 3-3: new full-speed USB device number 2 using ohci-pci [ +0.203077] usb 3-3: New USB device found, idVendor=1d50, idProduct=6029, bcdDevice= 1.00 [ +0.000008] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.000005] usb 3-3: Product: Marlin USB Device [ +0.000003] usb 3-3: Manufacturer: marlinfw.org [ +0.000003] usb 3-3: SerialNumber: 1800700AAF2919225AA718AEF50020C2 [ +0.002399] usb-storage 3-3:1.2: USB Mass Storage device detected [ +0.000356] scsi host9: usb-storage 3-3:1.2 [ +0.024963] cdc_acm 3-3:1.0: ttyACM0: USB ACM device [ +0.000415] usbcore: registered new interface driver cdc_acm [ +0.000002] cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters [ +1.003985] scsi 9:0:0:0: Direct-Access Marlin SDCard 01 1.0 PQ: 0 ANSI: 0 CCS [ +0.000861] sd 9:0:0:0: Attached scsi generic sg2 type 0 [ +0.009104] sd 9:0:0:0: [sdc] 245760 512-byte logical blocks: (126 MB/120 MiB) [ +0.008008] sd 9:0:0:0: [sdc] Write Protect is off [ +0.000007] sd 9:0:0:0: [sdc] Mode Sense: 00 00 00 00 [ +0.007987] sd 9:0:0:0: [sdc] Asking for cache data failed [ +0.000013] sd 9:0:0:0: [sdc] Assuming drive cache: write through [ +5.612429] sdc: sdc1 [ +0.041987] sd 9:0:0:0: [sdc] Attached SCSI removable disk [May22 14:56] usb 3-3: USB disconnect, device number 2 ```

That was while writing this post, and it looks to me like it goes right into Marlin, and is exactly what I'd expect to see. Last night I ran a Benchy as a sanity check (print time 1h 19m), with zero firmware or SKR problems. Powered it back up a second time (that is, after copying the above log), and did a quick axis and heaters check. Seems fine.

Quick update on the USB connections being dropped. Marlin has two primary ways of handling serious errors. It either calls kill or the watchdog timer is triggered.

So in this case you will see the Marlin USB stack disconnect (and be replaced by a smoothieboard device).

Right. Makes sense. But just to make it clear, while M122 causes the firmware to crash and trigger the watchdog under 7b4c3bd / 0.3.4, all other USB comms problems that I have mentioned in this Issue, on both the v1.1 and v1.3, appeared to be a hard crash, without calling kill(), and without ending up in Smoothie's bootloader.

(assuming you do not have WATCHDOG_RESET_MANUAL defined [...] )

USE_WATCHDOG is of course enabled, but WATCHDOG_RESET_MANUAL is not.

teemuatlut commented 5 years ago

@VanessaE Could I ask you to make a separate issue on the library page as this thread is a bit laborious to follow and decipher.

VanessaE commented 5 years ago

@VanessaE Could I ask you to make a separate issue on the library page as this thread is a bit laborious to follow and decipher

And here I was trying not to be unclear in my posts. :frowning_face:

As far as TMCStepper v0.3.4 being slow/crashing, I think @Bob-the-Kuhn has a better grip on that than me, see https://github.com/MarlinFirmware/Marlin/issues/14047#issuecomment-493682160 ... Better I shouldn't stick my nose any further into it. :stuck_out_tongue:

gloomyandy commented 5 years ago

@VanessaE Right so I have just tried the V1.3 configuration files you posted earlier with no modifications. I had no driver boards installed and no thermistors plugged in. The configuration has PIDTEMPBED enabled. The board boots fine, the SD card is available to my windows system, I can connect to the board using both pronterface and Repetier Host. I've even tried using the small/slow "free" SD card that comes with the board. Again everything works as I'd expect. The only difference that I can see is that I'm not running on USB power and I have things like endstops connected. Not sure I can do much more sorry.

VanessaE commented 5 years ago

But are you running the same git commit of Marlin and the same stepper and software serial libraries (0.3.4 and 0.1.3)? (and yeah, I was on USB power in that test).

There must be a bug, if both @neox3 and I have to turn that option off to get the board to boot.

ghost commented 5 years ago

@VanessaE I did find a number of bad bugs in the temperature code that has been causing Marlin to kill itself due to incorrect temperature error flags, and so would cause the USB link to fail. It's connected to you seeing the PIDTEMPBED problem etc.

The BED_MINTEMP/BED_MAXTEMP/HEATER/CHAMBER settings found in Configuration.h currently totally don't work in Marlin and also create false kills. It's been like that for quite some time (weeks).

The bug fixes are not being pulled into Marlin though :(

ghost commented 5 years ago

I also altered the TMC2208 driver slightly to not cause the board to kill itself with an M122 command if it gets no response from the TMC stepper drivers (board on USB power), but I've only done so in my own code (the TMC driver guy doesn't do pull-requests).

VanessaE commented 5 years ago

I believe @teemuatlut is the "driver guy" you're looking for. :smiley:

ghost commented 5 years ago

Only just woke up (8am here in the UK) so my brain isn't yet awake ;)

VanessaE commented 5 years ago

here :coffee: now wake the hell up. :smiley:

gloomyandy commented 5 years ago

@VanessaE I tested with 7b4c3bd and with a pull from two days ago (The one I'm using all of the time). No problems seen with either. Yes I'm using 0.3.4 and 0.1.3. I also tested with the master branch of TMCStepper as of 2 days ago.

BTW In the case of the various boot time problems how do you know that kill is not being called? How are you monitoring the state of the machine at that time?

@doggyfan What changes have you made to avoid the M122 problem? I think @teemuatlut has a PR (https://github.com/MarlinFirmware/Marlin/pull/14074), which I think is also intended to fix this problem.

As to your temperature PR, I'm sure it will get picked up eventually (assuming it is correct, I've not checked all of the details). The thing is is that it is quite complicated with lots of different changes. It is often better to have just one item in a PR. Yours contains at least 4 (return max temp on short, buzz on temp error, only error if heating and the min/max fixes). Having all of those together can make it hard to review the changes. It also helps if your PR and commits, contains details of what problem the change fixes and why the change you have made is the right way to fix the problem. But anyway this is just my view and is all off topic.

ghost commented 5 years ago

Yes I've put to many changes into one pull request. I still struggle with github, I don't quite know how to do what with it at times @gloomyandy.

github is my arch enemy ;)

teemuatlut commented 5 years ago

What changes have you made to avoid the M122 problem? I think @teemuatlut has a PR (#14074), which I think is also intended to fix this problem.

That PR doesn't do anything to M122 causing wd reset, but it should fix the mismatch between LCD variables and gcode commands. I got the board to reset with SKRv1.3 with M122 and no drivers (x5 configured) attached. In response I reduced the number of retries at least until I rewrite the whole M122 command. The current problem is that if there is no driver connected, the read command will write the read request, wait for 2ms and then try reading the sync nibbles for 5ms. If this fails, it tries again 3 times. This means that each (failed) read command can take around 30ms (42+45 = 28ms). Multiply that by the number of configured drivers and number of read commands (~35?) in the command and it starts to take a while. The commit to address this can be found in the master branch until I make the next release. I'd like to get at least some feedback first.

VanessaE commented 5 years ago

I tested with [...]

Then being on USB-supplied power must be the key, in the case of the PIDTEMPBED glitch? I really have no clue now.

BTW In the case of the various boot time problems how do you know that kill is not being called? How are you monitoring the state of the machine at that time?

By watching the firmware output as it comes in over Pronterface's console, if I connect quickly enough that is. If Marlin calls kill(), it has always clearly said so in the past. Not so when USB drops out. Then, it's just gone, with the previously pasted dmesg errors, and zero warning from Marlin. Hence it must be a crash the watchdog doesn't catch.

ghost commented 5 years ago

@VanessaE, you'll have to get one of the cheap LCD displays to plug onto your board ;)

VanessaE commented 5 years ago

Some day. Not now though.

gloomyandy commented 5 years ago

@VanessaE I really don't know what the problem is you are seeing. Maybe it is USB power, maybe it is something like noise being picked up by your board, but not mine (all of those empty connectors will not be helping!). All I know is that I can't reproduce it!

@teemuatlut My mistake I thought that your code that maintains the variables would also reduce the number of read requests during an M122.

VanessaE commented 5 years ago

I really don't know what the problem is you are seeing.

Recall, @neox3 was first to notice it. :smiley:

hudja commented 5 years ago

I have a same problem here. If I use dummy #define TEMP_SENSOR_BED 998 for bed sensor, then it works fine. I use external power, bed and hotend power and termistors connected and X driver in UART mode. If I use #define TEMP_SENSOR_BED 1 and TMC2208 in UART, I lose communication, but it works with TMC2208_STANDALONE and #define TEMP_SENSOR_BED 1. SKR 1.3, Ender 3.

https://github.com/MarlinFirmware/Marlin/issues/14136

hudja commented 5 years ago

Update. If I increase #define BED_MAXTEMP to 275, then I am able to boot and comm with #define TEMP_SENSOR_BED 1 and #define X_DRIVER_TYPE TMC2208. It seems to me that during the boot it reads the temperature too high for a second, and kicks in the protection. After the boot, termistor readings are correct (at least they show room temp).

ghost commented 5 years ago

The min/max temp warning at boot-up problem has just been fixed by using a grace period https://github.com/MarlinFirmware/Marlin/issues/14139

hudja commented 5 years ago

Thanks! It is working now!

ghost commented 5 years ago

Is it working for you now @neox3 ?

ghost commented 5 years ago

And for you now @VanessaE ?

VanessaE commented 5 years ago

I'll pass. I'm kinda paranoid about screwing around with newer versions now, especially when the older build I'm using works (once I got that damaged thermistor input out of the way).

TheNitek commented 5 years ago

Seems like I had the same problem (thermistor broken => Marlin kill()ing on the SKR 1.3). Isn't there a LED or something to indicate the killing? With the SKRs USB-UART I was running totally blind because it got killed before I could see anything. If I haden't found this bug by coincidence, I'd probably have thrown my SKR into the trash for no reason.

ghost commented 5 years ago

Their is no heart beat LED on the SKR 1.3 (silly idea for BTT to have removed it) I'm afraid, though you can add your own if you want and specify the LED_PIN in the pins file.

To be honest, with a dangerous machine like a 3D printer (high temperatures used, high currents used, high powers used) that can and does burn down houses, it's probably a good idea to invest in a £10 LCD display and leave it connected to the board as it does contain valuable real-time information/warnings.

TheNitek commented 5 years ago

Maybe it is a good idea for me to add one.

I have a TFT32 but unfortunately it wont cover the kill case. The LCD I still had in stock didn't work at all. So my debugging situation was kind of hard ;-)

ghost commented 5 years ago

The cheap full graphic LCD display also has a buzzer that now sounds an audible alarm if their is a thermal protection error (as you had). The audible alarm may (or not) grab your attention if a fire risk does ever occur, though it's a good idea to also invest in a smoke alarm of some sorts and fix it near the printer.

TheNitek commented 5 years ago

I have a smoke alarm on the ceiling above the printers since this seemed like the most reliable solution to me, so no need from that perspective. BTW. on of the Mosfets on the SKR already seems defective so the hotend kept heating all the time without Marlin being able to anything about it ...

Thank you all for this thread, really saved me a lot of time and from going crazy.

peppekerstens commented 5 years ago

I have a (almost) working setup with BigTreeTech SKR v1.3 & TMC2208's (UART mode) & Creality stock display. You can check my fork for version & current config. https://github.com/peppekerstens/Marlin (bugfix-2.0.x branch of course :))

I cannot test SDCard (standalone) access from within Marlin firmware/via LCD screen (because that is the issue I am facing....) The rest seems to work. I have no automatic bed-leveling or other expansions; it is a vanilla/stock Creality Ender-5 machine.

What do you guys and gals need confirmed/tested? If I need to produce some extra info; please say so and at least provide a hint on to where/how to get it please :)

gloomyandy commented 5 years ago

@peppekerstens You have posted on a couple of issues/PRs, but it is still not very clear exactly what your problem is. It looks like you are having some sort of problem getting your SD card reader working. It is unlikely that there is a bug in Marlin causing the problem you are having (as many, many people use the board and display you have and we would have seen other reports), so you may get more help by posting your questions on the bigtreetech facebook page (https://www.facebook.com/groups/505736576548648/), there are a lot of folks over there that have a similar setup to you and that can probably help, if you can't find help there try the MArlin facebook page or Discord channel (see pinned posts at top of issue list).

If after trying that you still have a problem then you should create a new issue and provide all of the details requested in the issue template.

peppekerstens commented 5 years ago

Maybe i was not clear enough in my previous post in this thread; I am willing to offer help/time to confirm and/or test the issue at hand discussed in thread. Other seem reluctant to do so. I do have a working setup with similar hardware which seem to work with latest build.

It seemed only prudent to inform you of my issue; as I am new to Marlin and do not know if it may be of influence on this one. I feel like it is not, but you may have a different perspective on that. It would be a waste of time if it was and I did not mention it first. That's it.

If that is not appreciated or wanted, so be it.

I actually have posted exactly one earlier message in this repo..... I deemed it to be polite to search in/for related existing issues instead of just dumbly creating new ones and thus overwhelming developers with doubles. That can sometimes be 'a hit and miss' exercise.

I do not do facebook, thank you very much. I will have I look in/on Discord if I can't fix it myself. Thank you for pointing that out.

gloomyandy commented 5 years ago

@peppekerstens I'm sure help on this issue would be appreciated, however I do think given the nature of the problem discussed in this issue it makes sense to resolve any other problems you are having first to avoid any (further) confusion. I just happened to see your other post and could see that you are new to Marlin so wanted to offer some guidance as to the best way to resolve the problems you are having, no offence was intended.

The other resources I mentioned are typically more interactive and have a wider audience than items posted here and tend to be a better way to resolve problems, and identify actual issues. It is a pity you do not use Facebook as there has been a lot of activity and useful information about the hardware you are using on the groups there, but obviously that is your choice.

peppekerstens commented 5 years ago

Fixed (https://github.com/MarlinFirmware/Marlin/issues/14320) Still available for any testing/confirmation...

coleledger commented 5 years ago

Hey I have had similar problem I think to the original post. I have skr 1.3 that I cant re-flash over USB anymmore after I held rest button over just uplug replug method. It wont show in finder even if I remove micro SD from SKR board and put it in a SD card reader. I can still see when I run a "ls /dev/" that my USBModem14101 or USBModem14201 is still active. I can connect with Pronterface but have no way to change the firmware. I also see in PIO with VSC that USBmodem1401 is in my devices but cant flash it. I am running VSC with PIO on mac os. I can add more information but I am not sure what else to add thanks

boelle commented 4 years ago

@neox3 is the issue still there?

boelle commented 4 years ago

Lack of Activity This issue is being closed due to lack of activity. If you have solved the issue, please let us know how you solved it. If you haven't, please tell us what else you've tried in the meantime, and possibly this issue will be reopened.

github-actions[bot] commented 4 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.