MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.1k stars 19.2k forks source link

[BUG] Bigtreetech SKR 1.3 (and others 1768 chips) + Delta Options + Any BIG option = Not boot #14006

Closed neox3 closed 4 years ago

neox3 commented 5 years ago

Today i finded this terrible bug, that makes my machine inoperative. For some strange reason, if you add to the firmware the config options of a delta, and enable big things that costs a lot of memory like autolevel , bed pid... the machine hangs and the SD reader and USB conection results inoperative...

Steps to Reproduce

  1. Copy the config files from the directory examples of delta mini kossel
  2. Modify it to add the board big tree SKR
  3. Add some basic options and compile. Try that, the machine responds to connections. ALL OK
  4. Add the option BED PID (enable) and compile. ERROR

Expected behavior: [What you expect to happen] Machine is OK.

Actual behavior: [What actually happens] Machine inoperative. The USB conecction is broken and the internal SD disappears from the file explorer in windows.

Additional Information

Tryed with another board with the LPC1768 chip, the RE-ARM and the result is the same, for that, the problem is in the firmware, not in the hardware.

I include two config files that shows the point when, if you enable or disaple BED PID, the board turns into OK or FAIL.

files.zip

p3p commented 5 years ago

I can't reproduce the problem with the supplied files (which are a little out of date to current configs) and a new clone of bugfix-2.0.x on a MKS SBase, are you testing without thermistors connected? this will cause the safe guards to trigger locking up the board.

neox3 commented 5 years ago

Some more info:

when it hangs, in screen the only you see is a reset of the electronics. And it is very rare that the files are outdated because i downloaded the latest.

same machine, with arduino + ramps 1.6, in firmware 1.1.9 is running well with all the same parameters... it is very strange, the only thing can be that it is a combination of things that the 32bits firmware doesnt like very well..

p3p commented 5 years ago

Please provide the full configs that cause the issue so it can be reproduced.

gloomyandy commented 5 years ago

Your supplied config files have the TMC drivers set in stand alone mode. They also have the BLTouch disabled. Are you sure you have uploaded the correct files?

neox3 commented 5 years ago

yes, are correct, because i was going from more to less , disabling things to find the point when disabling or enabling something causes the error.

now, at this point i nthe configuration files Marlin-bugfix-2.0.x.zip

, if you enable PIDTEMP in bed, the printer hangs and no USB and no internal SD. if you disable it, all OK.

i upload now all the files, thanks in advance!

eikeime commented 5 years ago

i think this is similar to the problem when enable FAST_PWM_FAN

gloomyandy commented 5 years ago

@eikeime if you have reported this as an issue please post a reference to it here. It saves people having to go search for whatever it is you are referring to.

VanessaE commented 5 years ago

I can confirm this on my end, to a certain degree: I'm not using a delta, but instead, my beat up old Prusa i3 clone. I'm using bugfix-2.0.x at commit 7b4c3bd92.

It looked like something went ... wrong ... on my system, something which caused my SKR v1.1 to stop working, but I couldn't immediately see a cause, so I started investigating.

List of troubleshooting attempts... * tried three different USB cables * tried four different USB ports, one of which is part of a card reader mounted in a 3½" slot * checked all of the connections on the SKR board * turned the computer and printer off for several minutes, by their mains switches (as suggested by a few google results) * re-seated some cables inside the computer * did several firmware swaps/reflashes while fiddling with the configs * blew away Marlin, re-cloned, and re-created its config by hand (starting from the supplied default files, using a backup copy as a reference, leaving out what I could) * booted my PC into an older install copied over from a drive I had just upgraded from, where I *know* the SKR v1.1 worked at one point * reinstalled my OS (onto my usual drive, natch; Debian stable 9.9, with standard-issue 4.9.0 kernel) In short, I tried everything I could think of.

I was inclined just write-off my SKR v1.1 as broken, so I tried my v1.3, with no better results initially. Then I ran across this issue/report, and figured it was worth a closer look.

Here's a log from a bootup of my SKR v1.3 with a normal build/config that by all accounts should work:

dmesg log, SKR v1.3, broken install ``` [ +0.000060] cdc_acm 5-2:1.0: ttyACM0: USB ACM device [ +0.235764] usb 5-2: new full-speed USB device number 24 using ohci-pci [ +0.180038] usb 5-2: device descriptor read/64, error -62 [ +1.564171] usb 5-2: new full-speed USB device number 25 using ohci-pci [ +0.195041] usb 5-2: New USB device found, idVendor=1d50, idProduct=6029 [ +0.000008] usb 5-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.000004] usb 5-2: Product: Marlin USB Device [ +0.000004] usb 5-2: Manufacturer: marlinfw.org [ +0.000003] usb 5-2: SerialNumber: 13006002AF2E94025B51801AF50020C1 [ +0.002195] cdc_acm 5-2:1.0: ttyACM0: USB ACM device [ +0.002325] usb-storage 5-2:1.2: USB Mass Storage device detected [ +0.000289] scsi host9: usb-storage 5-2:1.2 [ +1.015336] scsi 9:0:0:0: Direct-Access Marlin SDCard 01 1.0 PQ: 0 ANSI: 0 CCS [ +0.001309] sd 9:0:0:0: Attached scsi generic sg2 type 0 [ +0.009619] sd 9:0:0:0: [sdc] 245760 512-byte logical blocks: (126 MB/120 MiB) [ +0.008036] sd 9:0:0:0: [sdc] Write Protect is off [ +0.000009] sd 9:0:0:0: [sdc] Mode Sense: 00 00 00 00 [ +0.007988] sd 9:0:0:0: [sdc] Asking for cache data failed [ +0.000013] sd 9:0:0:0: [sdc] Assuming drive cache: write through [ +1.713156] sdc: sdc1 [ +0.041979] sd 9:0:0:0: [sdc] Attached SCSI removable disk [ +33.090356] usb 5-2: reset full-speed USB device number 25 using ohci-pci [ +15.577540] usb 5-2: device descriptor read/64, error -110 [May20 00:22] usb 5-2: device descriptor read/64, error -110 [ +0.288055] usb 5-2: reset full-speed USB device number 25 using ohci-pci [ +15.585518] usb 5-2: device descriptor read/64, error -110 [ +15.617535] usb 5-2: device descriptor read/64, error -110 [ +0.256035] usb 5-2: reset full-speed USB device number 25 using ohci-pci [ +10.813020] usb 5-2: device not accepting address 25, error -110 [ +0.176023] usb 5-2: reset full-speed USB device number 25 using ohci-pci [May20 00:23] usb 5-2: device not accepting address 25, error -110 [ +0.000107] usb 5-2: USB disconnect, device number 25 [ +0.000058] cdc_acm 5-2:1.0: ttyACM0: USB ACM device [ +0.015880] sd 9:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ +0.000006] sd 9:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 03 bb 20 00 00 f0 00 [ +0.000003] blk_update_request: I/O error, dev sdc, sector 244512 [ +0.000033] sd 9:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ +0.000003] sd 9:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 03 bc 10 00 00 10 00 [ +0.000002] blk_update_request: I/O error, dev sdc, sector 244752 [ +0.167961] usb 5-2: new full-speed USB device number 26 using ohci-pci [ +15.629491] usb 5-2: device descriptor read/64, error -110 [ +15.617487] usb 5-2: device descriptor read/64, error -110 [ +0.255984] usb 5-2: new full-speed USB device number 27 using ohci-pci [ +15.617496] usb 5-2: device descriptor read/64, error -110 [May20 00:24] usb 5-2: device descriptor read/64, error -110 [ +0.108103] usb usb5-port2: attempt power cycle [ +0.492011] usb 5-2: new full-speed USB device number 28 using ohci-pci [ +10.720960] usb 5-2: device not accepting address 28, error -110 [ +0.172021] usb 5-2: new full-speed USB device number 29 using ohci-pci [ +10.837029] usb 5-2: device not accepting address 29, error -110 [ +0.000063] usb usb5-port2: unable to enumerate USB device ```

As you can see, USB craps out some seconds after the SKR boots, and gets the ACM driver into some kind of slow loop. Once Linux decides it can't enumerate the board, it gives up. There's no further related output unless I reset the SKR, or disconnect and reconnect the USB cable (thus also power-cycling it, since that board's set to take its 5v power from USB).

Here are the config files that produced that output: Marlin-SKR-v1.3-20190520.zip

Now, if I go with @neox3's implied suggestion, and simply comment-out PIDTEMPBED from the above, leaving the rest untouched, then recompile, flash, and reboot the board, USB starts working properly:

dmesg log, SKR v1.3 working install ``` [ +1.168039] usb 5-2: new full-speed USB device number 35 using ohci-pci [ +0.195088] usb 5-2: New USB device found, idVendor=1d50, idProduct=6029 [ +0.000008] usb 5-2: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.000005] usb 5-2: Product: Marlin USB Device [ +0.000003] usb 5-2: Manufacturer: marlinfw.org [ +0.000004] usb 5-2: SerialNumber: 13006002AF2E94025B51801AF50020C1 [ +0.002186] cdc_acm 5-2:1.0: ttyACM0: USB ACM device [ +0.002323] usb-storage 5-2:1.2: USB Mass Storage device detected [ +0.000478] scsi host9: usb-storage 5-2:1.2 [ +1.039167] scsi 9:0:0:0: Direct-Access Marlin SDCard 01 1.0 PQ: 0 ANSI: 0 CCS [ +0.001325] sd 9:0:0:0: Attached scsi generic sg2 type 0 [ +0.008639] sd 9:0:0:0: [sdc] 245760 512-byte logical blocks: (126 MB/120 MiB) [ +0.007998] sd 9:0:0:0: [sdc] Write Protect is off [ +0.000010] sd 9:0:0:0: [sdc] Mode Sense: 00 00 00 00 [ +0.007998] sd 9:0:0:0: [sdc] Asking for cache data failed [ +0.000013] sd 9:0:0:0: [sdc] Assuming drive cache: write through [ +1.694149] sdc: sdc1 [ +0.041949] sd 9:0:0:0: [sdc] Attached SCSI removable disk ```

Incidentally, within a few seconds, my desktop auto-mounted the on-board SD card and popped-open a file manger, as it normally would.

Now, on my v1.1, same branch/commit, here's what I get from what should be a good config:

dmesg log, SKR v1.1 broken install ``` [May20 01:23] usb 3-3: new full-speed USB device number 19 using ohci-pci [ +0.195017] usb 3-3: New USB device found, idVendor=1d50, idProduct=6029 [ +0.000008] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.000004] usb 3-3: Product: Marlin USB Device [ +0.000004] usb 3-3: Manufacturer: marlinfw.org [ +0.000003] usb 3-3: SerialNumber: 1800700AAF2919225AA718AEF50020C2 [ +0.002159] cdc_acm 3-3:1.0: ttyACM0: USB ACM device [ +0.002370] usb-storage 3-3:1.2: USB Mass Storage device detected [ +0.000276] scsi host9: usb-storage 3-3:1.2 [ +1.011329] scsi 9:0:0:0: Direct-Access Marlin SDCard 01 1.0 PQ: 0 ANSI: 0 CCS [ +0.001279] sd 9:0:0:0: Attached scsi generic sg2 type 0 [ +0.008680] sd 9:0:0:0: [sdc] 245760 512-byte logical blocks: (126 MB/120 MiB) [ +0.008008] sd 9:0:0:0: [sdc] Write Protect is off [ +0.000009] sd 9:0:0:0: [sdc] Mode Sense: 00 00 00 00 [ +0.007967] sd 9:0:0:0: [sdc] Asking for cache data failed [ +0.000013] sd 9:0:0:0: [sdc] Assuming drive cache: write through [May20 01:24] usb 3-3: reset full-speed USB device number 19 using ohci-pci [ +15.577167] usb 3-3: device descriptor read/64, error -110 [May20 01:25] usb 3-3: device descriptor read/64, error -110 [ +0.284050] usb 3-3: reset full-speed USB device number 19 using ohci-pci [ +15.561204] usb 3-3: device descriptor read/64, error -110 [ +15.617137] usb 3-3: device descriptor read/64, error -110 [ +0.256059] usb 3-3: reset full-speed USB device number 19 using ohci-pci [ +10.812837] usb 3-3: device not accepting address 19, error -110 [ +0.147992] usb 3-3: reset full-speed USB device number 19 using ohci-pci [ +10.608835] usb 3-3: device not accepting address 19, error -110 [ +0.000150] usb 3-3: USB disconnect, device number 19 [ +0.000088] cdc_acm 3-3:1.0: ttyACM0: USB ACM device [ +0.019761] sd 9:0:0:0: [sdc] tag#0 FAILED Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ +0.000004] sd 9:0:0:0: [sdc] tag#0 CDB: Read(10) 28 00 00 00 00 00 00 00 08 00 [ +0.000002] blk_update_request: I/O error, dev sdc, sector 0 [ +0.000005] Buffer I/O error on dev sdc, logical block 0, async page read [ +0.000057] ldm_validate_partition_table(): Disk read failed. [ +0.000021] Dev sdc: unable to read RDB block 0 [ +0.000029] sdc: unable to read partition table [ +0.000406] sd 9:0:0:0: [sdc] Read Capacity(10) failed: Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK [ +0.000004] sd 9:0:0:0: [sdc] Sense not available. [ +0.000057] sd 9:0:0:0: [sdc] Attached SCSI removable disk [ +0.219430] usb 3-3: new full-speed USB device number 20 using ohci-pci [May20 01:26] usb 3-3: device descriptor read/64, error -110 [ +15.645209] usb 3-3: device descriptor read/64, error -110 [ +0.284038] usb 3-3: new full-speed USB device number 21 using ohci-pci [ +15.561191] usb 3-3: device descriptor read/64, error -110 [ +15.617190] usb 3-3: device descriptor read/64, error -110 [ +0.108020] usb usb3-port3: attempt power cycle [ +0.492038] usb 3-3: new full-speed USB device number 22 using ohci-pci [May20 01:27] usb 3-3: device not accepting address 22, error -110 [ +0.180002] usb 3-3: new full-speed USB device number 23 using ohci-pci [ +10.832830] usb 3-3: device not accepting address 23, error -110 [ +0.000064] usb usb3-port3: unable to enumerate USB device ```

Here are the config files that produced that output: Marlin-SKR-v1.1-20190520.zip

As of this writing, I couldn't come up with a working config; messing with the PIDTEMPBED setting is not enough to make things work. I also tried turning off the hotend PID, but no good, not that I'd have wanted print with that disabled.

But, here's where it gets weird:

This used to work, up till at least April 14 (the last time I printed something... and yeah I know, I should be ashamed for leaving the printer idle for so long, bored and unloved :stuck_out_tongue_winking_eye: ).

According to git log -g and my own Instructable, I had commit 2513f6b5501b6f9fcc9f0fcfabae0b119cb1634d loaded onto my SKR v1.1 at that time (plus a few commits after that from tweaking my settings). If I roll back to that point in a duplicate copy of Marlin, and build/flash from there, it still doesn't work (same errors as above). Even if I dig into my backup drive try a few copies of firmware.bin from various times when it was working before, none work now (they all give the same errors).

BUT, the SKR v1.1 works if I load the originally-supplied copy of Smoothie onto it! That firmware is unmodified, and unconfigured, so I can't print with it, of course, but the important part is that the USB connection was completely stable, and I could see/mount the SD card and add files to it, and I could connect via Pronterface and move the motors.

The SKR v1.3 works, at least enough to mount/read/write the SD card and connect with Pronterface and run commands like M115, M503, and so on, if I load the originally-supplied copy of Marlin onto it. No idea how that one is configured - no configs/sources were included on the SD card (just a firmware.bin and some volume information; I find it odd that the v1.3 did not come with Smoothie).

My Arduino/RAMPS also still works, though I only gave it a few-minutes' runtime just as a quick sanity check. That kit runs bugfix-2.0.x commit f6ab62bc1, and has a bunch of A4988's installed.

The above suggests that there's nothing wrong with the OS, my PC, or any of my other hardware.

I've found two more oddities: According to a udev rule I have had in place since forever, the SKR v1.1 used to show up as USB ID 1d50:6015. Now both it and the SKR v1.3 show up as USB ID 1d50:6029. Oddly, an earlier copy of my dmesg output shown in another Marlin issue, from a time when it was working, gives the v1.1 this latter ID, as well. If I load the original Smoothie onto the 1.1, it comes up with USB ID 1d50:6015.

Also, in all of my latest builds, the SD is identified in dmesg as Direct-Access Marlin SDCard 01, while old builds, and the stock firmware, all show(ed) Direct-Access Marlin Re-ARM SDCard 01.

What does this mean? What caused these changes?

VanessaE commented 5 years ago

@eikeime if you have reported this as an issue please post a reference to it here. It saves people having to go search for whatever it is you are referring to.

@gloomyandy, I believe he's referring to https://github.com/MarlinFirmware/Marlin/issues/13861

p3p commented 5 years ago

What does this mean? What caused these changes?

The framework was updated to remove the Re-ARM string when I added a unique serial number to the USB device descriptor as it didn't make sense.

According to a udev rule I have had in place since forever, the SKR v1.1 used to show up as USB ID 1d50:6015

The vendor ID 0x1d50 is OpenMoko inc that supply projects with free USB product IDs, Product ID 0x6029 being registered with them as Marlin 2.0 (USB Serial). It's used that ID since just after I created the LPC1768 platform (before Marlin 2 was officially a thing), product ID 0x6015 is registered to Smoothieware so if that shows up it means you are stuck in the bootloader before Marlin starts (we don't replace their bootloader) this is usually caused when the watchdog times out, leaving the board in a 'safe' state.

I'm not sure how to reproduce this issue on my boards (SBase and Re-ARM) atm.. or how anything I changed recently in the framework could have any effect.

gloomyandy commented 5 years ago

@VanessaE Couple of things firstly I have no idea why your old firmware.bin files don't work that is really odd unless something in your hardware setup has changed since the file originally worked. Are you using the same SD card for instance? The only thing I can think of is that it is perhaps picking up some setting (from a later version of Marlin) from flash that is causing the problem (in theory this should not happen as the "eeprom" should be versioned, but you never know).

As to older builds not working that is perhaps not so surprising, remember that Marlin uses other libs (like @p3p framework and the TMC stepper library), both of these have changed since that old version of Marlin so you will be using an old version of Marlin with a new framework and stepper lib. This is not ideal, but is what you will get. It could be that the problem you are seeing is a combination of old Marlin and new libs, or possibly the bug is in one of the libs?

On your 1.3 board it looks like something goes wrong as Marlin boots and you are seeing a mixture of the Marlin USB stack and the bootloader. Unfortunately there is a lot going on when Marlin boots so it is hard to be sure what is causing the problem. It could just be that something is taking too long and triggering a watchdog timeout or something.

If you want to investigate further you could try the "bad" configuration (on both boards) with the SD card removed? You will need to boot twice once to get the new firmware installed then remove the sd card and boot again. That will mean that the very large number of reads that take place when a host first starts to mount a USB drive will not happen. These reads generate a lot of activity and may be slowing the boot process down enough to cause a watchdog timeout. But this is just speculation really, but easy to test if you have the time.

VanessaE commented 5 years ago

@p3p

The vendor ID 0x1d50 is OpenMoko inc that supply projects with free USB product IDs, Product ID 0x6029 being registered with them as Marlin 2.0 (USB Serial). It's used that ID since just after I created the LPC1768 platform

This ID shows as unregistered on the few sites I tried looking it up on. But ok. I probably just used the USB ID from when I was first getting connectivity set up, and forgot about it. Red herring I guess.

@gloomyandy

unless something in your hardware setup has changed since the file originally worked

The only thing that has changed in any part of my hardware is the new drive in my PC that I mentioned (in fact, one drive installed, two removed). Unless you also count taking my mouse apart to fix a bad switch. :smiley:

Are you using the same SD card for instance?

Indeed so.

perhaps picking up some setting (from a later version of Marlin) from flash that is causing the problem

I hadn't considered this, but the few times I was able to catch the last couple of messages from the bootup spew before USB would crap out, there was usually a message about the EEPROM version being wrong and hardcoded defaults being loaded.

It could be that the problem you are seeing is a combination of old Marlin and new libs, or possibly the bug is in one of the libs?

This is my assumption as well -- at least for the older builds. Could such bugs affect new builds, too?

On your 1.3 board it looks like something goes wrong as Marlin boots and you are seeing a mixture of the Marlin USB stack and the bootloader

I suppose it's possible. I never mess with bootloaders if I can avoid it, on any platform. Hell, even just update-grub makes me nervous.

If you want to investigate further you could try the "bad" configuration (on both boards) with the SD card removed?

Ok. Something's going sideways here.

I literally turned the printer on, watched dmesg waiting for the SKR to boot (with Smoothie, which I loaded last night as a precaution), then pulled the SD card, loaded Marlin via my PC's card reader, put that back in the SKR, and reset it.

That made it work - at least for a while. And with the card still inserted, too.

It booted up clean, and stayed up, with Pronterface connected, for about 37 minutes.

However, when I tried to move the motors, it... went weird again. Motors acting like they're being driven WAY too fast, USB disconnects... back to the string of errors as with the bad startups, and no USB connectivity. Resetting and cycling power are no use. Pulled the card and reset, also no good. Here are some logs of that mess:

Various logs Initial power-up, with Smoothie still installed: ``` [May20 14:16] usb 3-3: new full-speed USB device number 2 using ohci-pci [ +4.319993] usb 3-3: new full-speed USB device number 3 using ohci-pci [May20 14:17] usb 3-3: New USB device found, idVendor=1d50, idProduct=6015 [ +0.000008] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.000004] usb 3-3: Product: Smoothieboard [ +0.000004] usb 3-3: Manufacturer: Uberclock [ +0.000004] usb 3-3: SerialNumber: 1800700AAF2919225AA718AEF50020C2 [ +0.008078] usb-storage 3-3:1.2: USB Mass Storage device detected [ +0.000303] scsi host9: usb-storage 3-3:1.2 [ +0.018514] cdc_acm 3-3:1.0: ttyACM0: USB ACM device [ +0.001294] usbcore: registered new interface driver cdc_acm [ +0.000001] cdc_acm: USB Abstract Control Model driver for USB modems and ISDN adapters [ +0.999865] scsi 9:0:0:0: Direct-Access MBED.ORG MBED USB DISK 1.0 PQ: 1 ANSI: 0 CCS [ +0.001022] scsi 9:0:0:0: Attached scsi generic sg2 type 0 [ +21.428280] sd 8:0:0:0: [sdb] 245760 512-byte logical blocks: (126 MB/120 MiB) [ +0.009000] sdb: sdb1 ``` After loading Marlin and resetting the SKR: ``` [May20 14:18] usb 3-3: USB disconnect, device number 3 [ +1.473836] usb 3-3: new full-speed USB device number 4 using ohci-pci [ +0.195065] usb 3-3: New USB device found, idVendor=1d50, idProduct=6029 [ +0.000008] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.000004] usb 3-3: Product: Marlin USB Device [ +0.000004] usb 3-3: Manufacturer: marlinfw.org [ +0.000004] usb 3-3: SerialNumber: 1800700AAF2919225AA718AEF50020C2 [ +0.002190] cdc_acm 3-3:1.0: ttyACM0: USB ACM device [ +0.002323] usb-storage 3-3:1.2: USB Mass Storage device detected [ +0.000293] scsi host9: usb-storage 3-3:1.2 [ +1.011245] scsi 9:0:0:0: Direct-Access Marlin SDCard 01 1.0 PQ: 0 ANSI: 0 CCS [ +0.001291] sd 9:0:0:0: Attached scsi generic sg2 type 0 [ +0.008673] sd 9:0:0:0: [sdc] 245760 512-byte logical blocks: (126 MB/120 MiB) [ +0.008007] sd 9:0:0:0: [sdc] Write Protect is off [ +0.000009] sd 9:0:0:0: [sdc] Mode Sense: 00 00 00 00 [ +0.007984] sd 9:0:0:0: [sdc] Asking for cache data failed [ +0.000013] sd 9:0:0:0: [sdc] Assuming drive cache: write through [ +1.579021] sdc: sdc1 [ +0.041988] sd 9:0:0:0: [sdc] Attached SCSI removable disk ``` After trying to move the motors failed, when I pulled the SD card and reset the SKR... ``` [ +1.495927] usb 3-3: new full-speed USB device number 18 using ohci-pci [ +0.195070] usb 3-3: New USB device found, idVendor=1d50, idProduct=6029 [ +0.000008] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.000005] usb 3-3: Product: Marlin USB Device [ +0.000003] usb 3-3: Manufacturer: marlinfw.org [ +0.000004] usb 3-3: SerialNumber: 1800700AAF2919225AA718AEF50020C2 [ +0.002194] cdc_acm 3-3:1.0: ttyACM0: USB ACM device [ +0.002333] usb-storage 3-3:1.2: USB Mass Storage device detected [ +0.000302] scsi host9: usb-storage 3-3:1.2 [ +1.023222] scsi 9:0:0:0: Direct-Access Marlin SDCard 01 1.0 PQ: 0 ANSI: 0 CCS [ +0.001269] sd 9:0:0:0: Attached scsi generic sg2 type 0 [ +0.018723] sd 9:0:0:0: [sdc] Attached SCSI removable disk ``` I tried to connect via Pronterface, without doing anything else... ``` [ +21.029845] cdc_acm 3-3:1.0: failed to set dtr/rts [ +5.119975] cdc_acm 3-3:1.0: failed to set dtr/rts [May20 15:08] usb 3-3: reset full-speed USB device number 18 using ohci-pci ``` Didn't work, so I shut the printer off... ``` [ +1.084104] usb 3-3: USB disconnect, device number 18 [ +0.000038] cdc_acm 3-3:1.0: ttyACM0: USB ACM device ``` Turned it back on... ``` [May20 15:09] usb 3-3: new full-speed USB device number 19 using ohci-pci [ +0.195082] usb 3-3: New USB device found, idVendor=1d50, idProduct=6029 [ +0.000008] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.000005] usb 3-3: Product: Marlin USB Device [ +0.000003] usb 3-3: Manufacturer: marlinfw.org [ +0.000004] usb 3-3: SerialNumber: 1800700AAF2919225AA718AEF50020C2 [ +0.002230] cdc_acm 3-3:1.0: ttyACM0: USB ACM device [ +0.002280] usb-storage 3-3:1.2: USB Mass Storage device detected [ +0.001003] scsi host9: usb-storage 3-3:1.2 [ +1.030548] scsi 9:0:0:0: Direct-Access Marlin SDCard 01 1.0 PQ: 0 ANSI: 0 CCS [ +0.000995] sd 9:0:0:0: Attached scsi generic sg2 type 0 [ +0.018974] sd 9:0:0:0: [sdc] Attached SCSI removable disk ``` Tried to connect with Pronterface again... ``` [May20 15:10] cdc_acm 3-3:1.0: failed to set dtr/rts [ +5.119980] cdc_acm 3-3:1.0: failed to set dtr/rts [ +0.164055] usb 3-3: reset full-speed USB device number 19 using ohci-pci [ +15.599847] usb 3-3: device descriptor read/64, error -110 [ +15.615935] usb 3-3: device descriptor read/64, error -110 [ +0.256027] usb 3-3: reset full-speed USB device number 19 using ohci-pci ``` No dice, shut it off. ``` [ +14.916213] usb 3-3: USB disconnect, device number 19 [ +0.000069] cdc_acm 3-3:1.0: ttyACM0: USB ACM device ```

The printer's still turned off, with the SD card sitting next to it. I have not tried the v1.3 yet, and now I'm kinda paranoid about even getting a valid test.

These reads generate a lot of activity and may be slowing the boot process down enough to cause a watchdog timeout

I had noticed something weird about the speed when I was switching back and forth between builds, yeah. Figured it was just Linux being slow.

gloomyandy commented 5 years ago

Hmm

245760 512-byte logical blocks: (126 MB/120 MiB)

That looks suspiciously like the "free" disk that Bigtreetech supply with the SKR board. I have seen at least a dozen or more folks have problems with using USB access to those cards. I'm not sure exactly what the problem is I suspect that Marlin is accessing it at too high a speed.

Note: I'm not saying that the SD card is causing all of the problems you are seeing (especially the ones when you are moving the motors etc.), bit it could be causing some of the USB disk problems.

Are all of the above tests with the V1.1 board or the V1.3?

I assume this is with a new build of Marlin (or at least one compiled recently) as it seems to be showing the new USB serial number? If so you might want to check that you have the very recent TMC libs 0.3.4 there have been a number of problems reported about some of the default settings (like microsteps etc.) that might be causing some of what you are seeing. If things are still not working you could try going back to an older TMC lib (I think 0.3.1 was pretty stable). Oh and always worth reloading default settings etc.

All of the above are just guesses though, what you are seeing is pretty odd!

blazer1 commented 5 years ago

I'm not as technical/knowledgeable as ya'll , but I'm having this issue. I got around to fiddling with my newly purchased skr 1.3 last night. Firstly, my board booted I could read the info from my lcd and my pc recognized the sd when plugged in. I then decided to use platformio and atom to config and build a new marlin 2.0 firmware.bin file, but when I reset the unit it flashed and that's when I started getting this problem. Smoothie ware works fine but when I flash back to marlin, reset the unit, windows says the last usb device malfunctioned. It shows up on com ports, briefly, then disconnects. At which point, I get new device in device manager under universal serial bus controllers, unknown usb device(device descriptor request failed).

neox3 commented 5 years ago

I'm not as technical/knowledgeable as ya'll , but I'm having this issue. I got around to fiddling with my newly purchased skr 1.3 last night. Firstly, my board booted I could read the info from my lcd and my pc recognized the sd when plugged in. I then decided to use platformio and atom to config and build a new marlin 2.0 firmware.bin file, but when I reset the unit it flashed and that's when I started getting this problem. Smoothie ware works fine but when I flash back to marlin, reset the unit, windows says the last usb device malfunctioned. It shows up on com ports, briefly, then disconnects. At which point, I get new device in device manager under universal serial bus controllers, unknown usb device(device descriptor request failed).

Exactly my problem! it is very strange, you are using too, tmc 2208?

blazer1 commented 5 years ago

I'm not as technical/knowledgeable as ya'll , but I'm having this issue. I got around to fiddling with my newly purchased skr 1.3 last night. Firstly, my board booted I could read the info from my lcd and my pc recognized the sd when plugged in. I then decided to use platformio and atom to config and build a new marlin 2.0 firmware.bin file, but when I reset the unit it flashed and that's when I started getting this problem. Smoothie ware works fine but when I flash back to marlin, reset the unit, windows says the last usb device malfunctioned. It shows up on com ports, briefly, then disconnects. At which point, I get new device in device manager under universal serial bus controllers, unknown usb device(device descriptor request failed).

Exactly my problem! it is very strange, you are using too, tmc 2208?

No. I had A4988's in a couple sockets to test and it worked. Then i put the tmc2130's, made changes to firmware and here we are.

p3p commented 5 years ago

Have you made sure your using the newest version of the TMC lib? there were some issues that were fixed recently. If you are can you try with all the same settings except putting the drivers in standalone, and see if that works.

VanessaE commented 5 years ago

That looks suspiciously like the "free" disk that Bigtreetech supply with the SKR board.

It is, but if you look at the log, there's one instance where I tried without that SD card inserted.

Are all of the above tests with the V1.1 board or the V1.3?

All with the v1.1, because it's already installed in my printer, and has been in reliable service for several weeks.

I assume this is with a new build of Marlin (or at least one compiled recently) as it seems to be showing the new USB serial number?

Yep, brand new build, or as new as Atom/PlatformIO are willing to make it. It's commit 7b4c3bd, compiled last night (no changes since, so Atom/PIO simply copied the firmware.bin to the SD card when I tried today).

If so you might want to check that you have the very recent TMC libs 0.3.4

I assume you mean TMCStepper. Okay. From within the Marlin tree, platformio lib list shows that I have 0.3.1. Not sure if it came with Marlin, or if I upgraded to that at some point. But, no matter.

Saved my configs, blew away ALL copies of Marlin, re-cloned, git checkout bugfix-2.0.x, copied my configs back in, then did platformio lib list again. Now it shows TMCStepper is at 0.3.3:

List of installed/recognized libraries ``` vanessa@rainbird:~/RepRap/Marlin-bugfix-2.0.x-SKR-v1.1$ platformio lib -g list Library Storage: /home/vanessa/.platformio/lib vanessa@rainbird:~/RepRap/Marlin-bugfix-2.0.x-SKR-v1.1$ platformio lib list Library Storage: /home/vanessa/RepRap/Marlin-bugfix-2.0.x-SKR-v1.1/.piolibdeps 30aa480 ======= Version: 0.0.0 Keywords: uncategorized Source: https://github.com/lincomatic/LiquidTWI2/archive/30aa480.zip Adafruit NeoPixel ================= #ID: 28 Arduino library for controlling single-wire-based LED pixels and strip. Version: 1.1.3 Keywords: display Compatible frameworks: arduino Compatible platforms: atmelavr, atmelsam, espressif8266, intel_arc32, microchippic32, nordicnrf51, teensy, timsp430 Authors: Adafruit Arduino-L6470 ============= L6470 stepper driver library Version: 0.7.0 Keywords: l6470, stepper, driver Compatible frameworks: * Compatible platforms: avr, sam Authors: Adam Meyer, Scott Lahteine Source: https://github.com/ameyer/Arduino-L6470/archive/dev.zip LiquidCrystal ============= #ID: 136 LiquidCrystal Library is faster and extensable, compatible with the original LiquidCrystal library Version: 1.3.4 Keywords: lcd, hd44780 Compatible frameworks: arduino Compatible platforms: atmelavr, espressif8266 Authors: F Malpartida SailfishLCD =========== Version: c8ac22f Keywords: uncategorized Source: git+https://github.com/mikeshub/SailfishLCD.git SailfishRGB_LED =============== Version: 2426fa2 Keywords: uncategorized Source: git+https://github.com/mikeshub/SailfishRGB_LED.git SlowSoftI2CMaster ================= Version: 3a18be5 Keywords: uncategorized Source: git+https://github.com/mikeshub/SlowSoftI2CMaster.git TMCStepper ========== #ID: 5513 Arduino library for configuring Trinamic stepper drivers. Version: 0.3.3 Keywords: tmc, trinamic, stepper, driver, spi, uart, tmc2130, tmc2160, tmc2208, tmc2224, tmc2660, tmc5130, tmc5160, tmc5161 Compatible frameworks: arduino Compatible platforms: atmelavr, atmelsam, espressif32, espressif8266, infineonxmc, intel_arc32, kendryte210, microchippic32, nordicnrf51, nordicnrf52, ststm32, ststm8, teensy, timsp430 Authors: teemuatlut U8glib-HAL ========== #ID: 1932 Unofficial repository for combined U8G and U8Glib-ARM with HAL extensions Version: 0.4 Keywords: u8g, u8glib, arm, hal Compatible frameworks: * Compatible platforms: * Authors: Oliver Kraus Source: https://github.com/MarlinFirmware/U8glib-HAL/archive/dev.zip c1921b4 ======= Version: 0.0.0 Keywords: uncategorized Source: https://github.com/trinamic/TMC26XStepper/archive/c1921b4.zip ```

The PIO Home library manager also shows 0.3.3 to be the latest version available (through whatever the official channel is). I did download the 0.3.4 release from the TMCStepper Github repo just now, but I can find no clear instructions anywhere on how to install it.

Keeping the 0.3.3 library for now, I just compiled and flashed. Again, still using my v1.1.

It started up clean, and Pronterface could connect, but the bed thermistor was reading in excess of 270°C (jittering up and down by a few degrees, so Pronterface was actively communicating; the hotend thermistor read 23°C, which is normal). I slowly pulled the Y table forward to look at the wiring.

As soon as I did that, it started throwing USB errors. Card inserted or not, once it's in this state, it wants to stay that way. There's no obvious way to start over/reset everything back to that "sorta worked" state.

I know this sounds like a bad ground or something, but if that were the case, the errors would continue if "interrupt" them, load Smoothie, and reset.

EDIT:

I also tried running the drivers in legacy mode (I.e. comment-out the driver defines), and tried TMC2208_STANDALONE. I even tried reducing my SKR v1.1 down to nothing but the power, hotend, and bed connected. No thermistors, no end stops, no driver modules. Same problem.

gloomyandy commented 5 years ago

You really want to have version 0.3.4. I'm not sure why it is not being used by platformio, @p3p @teemuatlut any idea as why the latest TMCStepper lib is not showing up?

teemuatlut commented 5 years ago

Because PIO has always been slow to pick up a new version. You can pull a specific version using a git/zip link and bypass the whole PIO library manager.

VanessaE commented 5 years ago

Like I said, I can find no clear instructions on how to do that.

teemuatlut commented 5 years ago
#
# NXP LPC176x ARM Cortex-M3
#
[env:LPC1768]
platform          = https://github.com/p3p/pio-nxplpc-arduino-lpc176x/archive/master.zip
framework         = arduino
board             = nxp_lpc1768
build_flags       = -DTARGET_LPC1768 -DU8G_HAL_LINKS -IMarlin/src/HAL/HAL_LPC1768/include -IMarlin/src/HAL/HAL_LPC1768/u8g ${common.build_flags}
# debug options for backtrace
#  -funwind-tables
#  -mpoke-function-name
lib_ldf_mode      = off
lib_compat_mode   = strict
extra_scripts     = Marlin/src/HAL/HAL_LPC1768/upload_extra_script.py
src_filter        = ${common.default_src_filter} +<src/HAL/HAL_LPC1768>
monitor_speed     = 250000
lib_deps          = Servo
  LiquidCrystal
  U8glib-HAL=https://github.com/MarlinFirmware/U8glib-HAL/archive/dev.zip
- TMCStepper@<1.0.0
+ https://github.com/teemuatlut/TMCStepper/archive/v0.3.4.zip
  Adafruit NeoPixel=https://github.com/p3p/Adafruit_NeoPixel/archive/master.zip
gloomyandy commented 5 years ago

Hmm I'm really not sure. I have just got my SKR V1.1 board up and running again as a test rig, no thermistors, endstops or heaters connected. Five TMC2208 drivers with UART control. Seems to be working fine. All drivers connect, USB access to the SD card, can connect using Repetier-Host, motors move, LCD display working. Using the latest Marlin (from 10 minutes ago and 0.3.4 of the TMCStepper lib).

VanessaE commented 5 years ago

Ok. Made that change to 0.3.4, did PIO Clean (just for paranoia sake), compiled, flashed the SKR and then reset it a couple of times, then pulled the card out and reset again.

The dmesg output from that last bootup looks good, but Pronterface won't connect (it just sits there, no errors in its console nor in dmesg). Tried unplugging/reconnecting the USB cable. Still can't connect, still no errors. The SD card is not inserted.

Log of that last boot ``` [ +1.620188] usb 3-3: new full-speed USB device number 39 using ohci-pci [ +0.194605] usb 3-3: New USB device found, idVendor=1d50, idProduct=6029 [ +0.000008] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.000004] usb 3-3: Product: Marlin USB Device [ +0.000004] usb 3-3: Manufacturer: marlinfw.org [ +0.000003] usb 3-3: SerialNumber: 1800700AAF2919225AA718AEF50020C2 [ +0.002197] cdc_acm 3-3:1.0: ttyACM1: USB ACM device [ +0.002365] usb-storage 3-3:1.2: USB Mass Storage device detected [ +0.000323] scsi host9: usb-storage 3-3:1.2 [ +1.019295] scsi 9:0:0:0: Direct-Access Marlin SDCard 01 1.0 PQ: 0 ANSI: 0 CCS [ +0.001378] sd 9:0:0:0: Attached scsi generic sg2 type 0 [ +0.018602] sd 9:0:0:0: [sdc] Attached SCSI removable disk ``` disconnect/reconnect USB... ``` [May20 18:35] usb 3-3: USB disconnect, device number 39 [ +6.760941] usb 3-3: new full-speed USB device number 40 using ohci-pci [ +0.195098] usb 3-3: New USB device found, idVendor=1d50, idProduct=6029 [ +0.000007] usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3 [ +0.000005] usb 3-3: Product: Marlin USB Device [ +0.000004] usb 3-3: Manufacturer: marlinfw.org [ +0.000003] usb 3-3: SerialNumber: 1800700AAF2919225AA718AEF50020C2 [ +0.002190] cdc_acm 3-3:1.0: ttyACM0: USB ACM device [ +0.002322] usb-storage 3-3:1.2: USB Mass Storage device detected [ +0.000352] scsi host9: usb-storage 3-3:1.2 [ +1.027321] scsi 9:0:0:0: Direct-Access Marlin SDCard 01 1.0 PQ: 0 ANSI: 0 CCS [ +0.001406] sd 9:0:0:0: Attached scsi generic sg2 type 0 [ +0.018558] sd 9:0:0:0: [sdc] Attached SCSI removable disk ```

So... while it just won't connect, having no errors pop up is good sign, right?

VanessaE commented 5 years ago

Welp, I decided to try one last desperate fix: Upgraded my OS (via a fresh install), to Debian testing from an install image fetched today. I figured there must be some incompatible mixture of packages, between Stable, PlatformIO, Atom, Marlin, etc etc. But nope, that did no good. It worked at first, for all of a minute, then crapped out.

But, that time it ran long enough to move the motors, even home X, and Pronterface showed the thermistor readings, then oddly, Marlin kill()'d itself, citing thermal runaway -- but I hadn't even tried to turn either heater on.

One thing I else discovered: if the SKR is in that screwed-up state, I don't even have to have the USB connected to trigger it. Unplug USB, hit reset, wait a bit for the SKR to boot up, connect the cable, new full-speed USB device number 25 using ohci-pci pops up in dmesg ...and then nothing. No other info is printed describing the board, though if I wait a bit, it starts throwing those "-110" errors, eventually failing to enumerate, as before.

I wonder if that bogus thermal runaway error is meaningful?

p3p commented 5 years ago

You mentioned earlier that the bed temp was reporting as a high value, so I'm not sure it is bogus, Marlin will kill itself if a temperature reading is bad. I presume if you either disable or set bed thermistor to a dummy that everything will work okay at this point now the main issue with the TMC communications has been fixed.

If Marlin gets into a watchdog reset state the usb log will be all over the place as the device resets and comes back and gets stuck in the smoothieware bootloader.

VanessaE commented 5 years ago

You mentioned earlier that the bed temp was reporting as a high value, so I'm not sure it is bogus, Marlin will kill itself if a temperature reading is bad.

Sure, I get that. But normally that's only the case if one of the heaters is actually turned on, no? Otherwise you wouldn't be able to boot up a board for testing without thermistors connected (because they'd read 0, while the minimum default is 5°C).

I presume if you either disable or set bed thermistor to a dummy that everything will work okay at this point now the main issue with the TMC communications has been fixed.

I gave it a try, using a dummy setting of 999 for the hotend, and 998 for the bed. Didn't help.

At the moment, with that setting in place, the board's still getting itself into this stuck state, but just like before, if I load the OEM Smoothie build onto the card, pop it in, and reset it wakes up, I can connect, move the motors, issue commands, etc. as much as I want, within the limits imposed by that build being unconfigured, of course.

I just now tried setting the hotend back to 5, and set the bed to 0, to disable it. That did something! Now I can mount/read/write the SD card, connect, move the motors around, home, turn the layer fan on/of, and even turn on the hotend heater.

Something of note: For normal printing, I have bed thermal runaway protection turned off, as my bed heater is wimpy and takes forever to heat up (it's only 85W or thereabouts), and with a maximum of only about 120°C, it has no chance of ever starting a fire. :stuck_out_tongue:

I find it interesting that I have to disable the bed heater entirely, while @neox3 could just turn BED PID mode off (of course, he's using a v1.3, and I'm using my v1.1). Or more to the point, that both his and my glitches both center on the bed heating part of the system.

If Marlin gets into a watchdog reset state the usb log will be all over the place as the device resets and comes back and gets stuck in the smoothieware bootloader.

I take it, then, that none of my previous logs indicated this situation?

VanessaE commented 5 years ago

Just for grins, I went ahead and loaded up Slic3r, chose one of my faster profiles, then passed a test model through it and on to Pronterface to be "printed", with the printer still using the above-mentioned "bed heater disabled" Marlin build (and with literally every other setting being completely normal for the machine).

By "printed", I mean I let it go through the motions, but without any filament, and without setting the Z offset (so the hotend started just a bit higher above the bed than normal, for safety). A dry run, so to speak. Ran the printer like that for about an hour.

There were no USB errors or faults of any kind.

I did hear what sounded like a few stalls along X during the test, but that's just a minor mechanical issue, and I have the current set too low.

Speeds along X, Y, and Z seemed normal, but the extruder was turning far slower than it should... I have the extruder set to 16x (see my previous Configuration_adv.h roughly line 1592), but M122 reports that it's being set to 256x (and shows all four 2208's are communicating normally).

ghost commented 5 years ago

The dmesg output from that last bootup looks good, but Pronterface won't connect (it just sits there, no errors in its console nor in dmesg). Tried unplugging/reconnecting the USB cable. Still can't connect, still no errors. The SD card is not inserted. Log of that last boot

So... while it just won't connect, having no errors pop up is good sign, right?

@VanessaE, on the SKR, you need to have the DTR option ticked in Pronterface menu/options, else it won't connect using the USB serial connection .. I've found, in Windows at least.

VanessaE commented 5 years ago

@doggyfan, that's already set and hasn't been touched in.. I dunno how long; see also my later posts in this thread.

ghost commented 5 years ago

yep @VanessaE, bit slow this morning (7:35am here), just catching up ;)

But the info might help others trying to use Pronterface with the SKR's I guess.

ghost commented 5 years ago

I keep seeing messages about Marlin locking up with no user feedback at all :(

It's seems normally due to invalid temperature detections. Marlin shouldn't be bothering with what the temperature is (bed, hotend, finger tips or whatever) unless the user is specifically wanting a set temperature. It's a pain.

[ continues reading thread to see if I can help ]

ghost commented 5 years ago

For now @VanessaE, add this to your SKR 1.1 pins file (anywhere in the file will do) ..

// Ignore temp readings during develpment.
#define BOGUS_TEMPERATURE_FAILSAFE_OVERRIDE
ghost commented 5 years ago

Using @neox3 (OP) config files on my SKR 1.3 all seems OK, no reboots or anything.

@neox3 your using old config files. Are you using the current up-to-date Marlin ?

ghost commented 5 years ago

Have compiled using your last posted config files @VanessaE, though I had to change the board type to the SKR 1.3 as that's the only 32-bit board I have.

It runs OK, SD card shows up in windows no problem at all. Tried a few commands using the serial USB connection and that's OK. Difficult to find the problem when I can't reproduce it :(

I don't have an SKR 1.1 board to try.

VanessaE commented 5 years ago

OK, that override, plus setting the thermistor type back to 13 (its normal setting) at least allows it to remain online. It keeps repeating Error:MAXTEMP triggered, system stopped! Heater_ID: bed endlessly, but it otherwise remains online (since obviously it doesn't call kill()). I assume this is normal.

This in turn allowed me see the thermistor reading and to fiddle with the bed. It's reading 0 most of the time, but pressing down on the edge near the thermistor wires' connection does two things: It makes the thermistor reading go nuts, and it causes the firmware to turn the bed on without my explicit order. "Okay, it's a mechanical/electrical problem that Marlin can't compensate for", I'm thinking.

And then I thought to take the glass off of the bed. Success! Once I took it off, the thermistor reading returned to normal and the "MAXTEMP" messages mentioned above ceased. Apparently the aluminum tape on the bottom of the glass (used for my Z sensor) was short-circuiting the thermistor's wires/solder joints.

Evidently, a certain four-legged feline must have stepped/sat on the bed and pushed it down enough at some point to make contact, and it stayed that way. Lesson learned. Insulate the thermistor wires better, and do a better job of keeping the cat away. :grin:

BUT:

When I say the heater turns on, I mean the firmware turns the bed MOSFET on - its LED and the bed's LED turn on, and the power supply fan increases speed in response to the load. It doesn't stay on -- it varies off/on in response the short circuit in the thermistor connection (when I was first pressing on the edge of the bed by varying amounts). Now mind you: at no point did I order Marlin to switch the bed on, so this electrical fault aside, there's still a bug to be dealt with. I would have expected there to simply be some screwball reading from the shorted thermistor and just leave it at that, without crashing the firmware. :smiley:

gloomyandy commented 5 years ago

Hmm cat problems eh... I have no idea what is going on with the heater turning on that sounds very odd.

So out of curiosity I arranged things so I could wire a dead short over the bed temp thermistor. With my standard configuration I get a "Err: MAXTEMP BED PRINTER HALTED" message on the display. In Repetier Host I get the following message "Error:Thermal Runaway, system stopped! Heater_ID: bed Error:Printer halted. kill() called!" which is pretty much as expected. The board then just sits there waiting for a reset. I don't see any sign of the bed heater turning on.

@VanessaE can you confirm exactly what thermal settings you have when you see the heated bed being powered and I'll set my board up to match and give it a try (may be later today though as I have to go out). As explained below I have pretty much all of the standard thermal protection enabled and do not have BOGUS_TEMPERATURE_FAILSAFE_OVERRIDE set.

Other than that I can't really contribute much to this as everything I try seems to work fine. I am building with Marlin as of last night and the 0.3.4 TMCStepper libs. I have a V1.1 board sat here in front of me basically using the same configuration it had when I used it to control my printer (I now use a V1.3 board). So have have thermistors defined but not connected and no bed or hotend connected. I do have all of the drivers and display. When I boot the board all of the drivers are working, an M122 shows the correct number of microsteps (16 for X,Y,Z1,Z2 4 for E) the correct current readings. The temperature readings (with no thrmistors connected) are 0 for the hotend (type 5 thrmistor) and -14 (type 1 thermistor). I get no reboots, no thermal errors (though I do have thermal safety checks enabled). My Sd card is mounted and I can copy files to/from it. I can happily connect to the USB serial port and use it from Repetier Host. I have also tried enabling PIDTEMPBED and everything still works fine. All of this is with a conventional Cartesian type machine. My settings (for what it is worth are here) https://github.com/gloomyandy/Marlin/tree/myskr

I also have a V1.3 board attached to my printer. I've updated this to the same build as above and again it works fine, no printer resets etc. I've tried a couple of test prints with no issues at all.

VanessaE commented 5 years ago

With my standard configuration I get a "Err: MAXTEMP BED PRINTER HALTED" message on the display. In Repetier Host I get the following message "Error:Thermal Runaway, system stopped! Heater_ID: bed Error:Printer halted. kill() called!" which is pretty much as expected.

But these are NOT normal. There was a time when kill() was intentionally being called on any suspect thermistor reading, but then the code was made to error-out only if the firmware is also being ordered to turn the corresponding heater on -- i.e. only fail if it gets bad readings WITH heat applied. That it's failing without heat applied is a regression.

@VanessaE can you confirm exactly what thermal settings you have when you see the heated bed being powered and I'll set my board up to match and give it a try (may be later today though as I have to go out).

See my previously-linked "broken on 1.1" config, just with that override in place. Git commit 7b4c3bd92a5f327c4118483fe2eeb45d5f1416a1.

No sooner do I say I've found the problem, does it happen again, because I triggered a thermistor fault while servicing it (things were looking pretty ragged, it needed a good cleanup). Yeah, yeah, I know I should do that kind of thing with the bot powered off, but I wanted to see the temperature readings live. Despite having metered the thermistor connections and wiring all the way from the thermistor to the SKR connector once the work was finished (with it unplugged from the SKR, of course, and finding it to be 100% good the whole way), it's reading 0° (with type 13) and scrolling "MAXTEMP" errors.

If I keep the "override", and set the bed thermistor to type 998, with a dummy value of 45, it shows 45° in Pronterface AND has continuously scrolling "MAXTEMP" errors. Clearly, even if the thermistor circuit is still screwed up, it can't possibly matter at that point.

Recall that I had a dummy setting in place earlier as a test, and it still barfed, because I hadn't yet been told to try that "override".

So at this point, the only thing failing is the bed thermistor code (well, and the extruder microsteps setting mentioned before, but that's unrelated).

As long as I have that "override" in place, or if I set the bed thermistor to 0, I have full access to the rest of the printer - I can mount/read/write the SD card, connect, move motors, heat up the hot end, turn fans on/off, issue commands and see their responses, etc. I could even print something over USB, not that much of anything will stick to cold Printbite. I imagine I could initiate a print directly from the SD card, also.

Since it DID work for a bit there, and since it all tends to work initially if I leave the board powered-off for a good while (at least an hour, it seems), getting itself stuck in an error state once it thinks the fault happens (until I power-off again for a while), my guess is that there could be an uninitialized variable being referenced in the bed heater code. Something which gets set to something meaningful when the fault is detected, thus triggering the fault code, but which doesn't get reset to something sane at the next bootup or when the fault has been cleared.

ghost commented 5 years ago

That pussy cat's going to end sticking itself to the bed if it's nicely heated up at the time it struts over it !

But yes, loose/broken/shorting connections need sorting. At least you've found it, means you can fix that, but yes Marlin needs to error out nicely.

VanessaE commented 5 years ago

That pussy cat's going to end sticking itself to the bed if it's nicely heated up at the time it struts over it !

Oh no worries there, they figured out long ago not to mess with the printer when it's active. When it's idle, apparently it's fair game. :stuck_out_tongue:

ghost commented 5 years ago

I'll too have a play with shorting the thermistor pins together, see what happens where.

p3p commented 5 years ago

Sure, I get that. But normally that's only the case if one of the heaters is actually turned on, no?

That was recently changed, I'm presuming it was intentionally as we had to add a time delay before the protection kicked in to give the ADC filters to normalise on the LPC176x platforms to fix a few issues that instantly showed up.

I would have expected there to simply be some screwball reading from the shorted thermistor and just leave it at that, without crashing the firmware

Marlin needs to error out nicely

Its supposed to put an error on any connected display, transmit the error over any connected serial ports and then put it self into a safe state requiring a reset to continue, it doesn't crash.

I can't think of any code path that would lead to the bed becoming active because of a thermistor reading whatever it is, if this is the case its very worrying ..

ghost commented 5 years ago

I found a bug in the SCAN_THERMISTOR_TABLE that returns the minimum temperature if the thermistor is reading a very high temperature (short or close to short).

So have made a pull request to fix that. So now it reads MAX TEMP error rather than MIN TEMP error when the thermistor temperature is too high.

gloomyandy commented 5 years ago

ts supposed to put an error on any connected display, transmit the error over any connected serial ports and then put it self into a safe state requiring a reset to continue, it doesn't crash.

Which is what it does for me.

VanessaE commented 5 years ago

Ok, I'm just plain lost now. My theory of "works after powering-off for a while" proved untrue.

@gloomyandy can you try git commit 7b4c3bd92a5f327c4118483fe2eeb45d5f1416a1 on your SKR v1.1, with this config set: Marlin-SKR-v1.1-20190521.zip

What does the bed thermistor read? do you get the scrolling MAXTEMP messages?

That was recently changed, I'm presuming it was intentionally as we had to add a time delay before the protection kicked in to give the ADC filters to normalise on the LPC176x platforms to fix a few issues that instantly showed up.

Making it throw an error and trigger a kill(), when that error can't possibly cause any harm (because no one told the board to turn the heater on in the first place) is a bad idea. It confuses the user, serves no purpose, and makes troubleshooting 10x harder than it needs to be. If there's some delay before allowing that code to kick in, fine, but it still shouldn't error if the heater isn't on (or meant to be).

ManuelMcLure commented 5 years ago

Making it throw an error and trigger a kill(), when that error can't possibly cause any harm (because no one told the board to turn the heater on in the first place) is a bad idea. It confuses the user, serves no purpose, and makes troubleshooting 10x harder than it needs to be. If there's some delay before allowing that code to kick in, fine, but it still shouldn't error if the heater isn't on (or meant to be).

The change was made to allow Marlin to turn off the power supply (through PS_ON_PIN) in case a hardware issue ended up causing the bed or hotend MOSFET to be stuck "on". For example, on RAMPS it's easy for the tab of the fan MOSFET to short against the tab of the bed MOSFET, causing the bed to turn on whenever the fan is turned on (I've seen at least three instances of this). Also, one of the most common failure modes for MOSFETs is for them to get stuck "on" regardless of the input signal.

Patag commented 5 years ago

@VanessaE I also encountered the unexpected bed heating bug, but I was not able to perform any further test and reproduce the issue: I fried my v1.3 board after wrongly unplugged/plugged-in bed connector (shifted of 1 pin), hence applying 24V on the 3.3v max tolerant analog input. I'll give you some news soon, end of this week if all is ok

VanessaE commented 5 years ago

The change was made to allow Marlin to turn off the power supply (through PS_ON_PIN) in case a hardware issue ended up causing the bed or hotend MOSFET to be stuck "on".

Ok, but that's something I should think would be easily detected by the normal watchdog i.e. either the temperature reading exceeds the configured limit, or it is not rising/falling consistent with the configured timings. THEN you can kill() and shut off the power supply.

I can accept the notion that my v1.1 is toast, but I have a hard time with that in the face of these contradictory, nonsensical issues, especially when the v1.3 obviously isn't behaving 100% properly either.

On a hunch, I swapped thermistor pins 0 and 2 in the v1.1's pins file, thus reassigning the unused "HE1" input to serve the bed, and disabled the "override"... that is, I've gone back to a 100% standard config, aside from those swapped pins. That works fine! No problems at all. No errors until I unplug the bed thermistor (at which point it loses USB comms)... this means my v1.1 is behaving as others here says it should, so long as I use that secondary thermistor port for the bed.

I also pulled an old copy of Marlin from my backup drive, from the middle of April (not just a firmware.bin, but the entire tree as it existed at the time, commit 2513f6b5). Loaded it into Atom, swapped those pins, PIO Clean/compile/flash. That works, too. At least enough that I can print completely normally with it.

That confirms that my v1.1 just has a bad thermistor port, probably damaged by the combo of the shorted thermistor earlier, and the thing where the bed turned itself on. I can only theorize that the short temporarily connected one of the thermistor's leads into the heater "coils", and one thing back-fed into the other, causing the MOSFET to turn on briefly.

That still leaves lots of unanswered questions:

On the v1.1:

On the v1.3:

ghost commented 5 years ago

I've found more bugs in the temperature code @VanessaE, as soon as I work out a fix I'll put a pull request in (today sometime).

gloomyandy commented 5 years ago

Have not been able to test anything for the last couple of days, sorry, sometimes real life gets in the way.

Quick update on the USB connections being dropped. Marlin has two primary ways of handling serious errors. It either calls kill or the watchdog timer is triggered.

In the case of kill being called it will display a message on any attached display and also sends a message to any active serial ports. It then disables all interrupts and waits in a tight loop for the reset button to be pressed. With all interrupts disabled the USB stack can not function, so no surprise that you get a USB disconnect.

If the watchdog timer is triggered what is supposed to happen is that Marlin will reboot, but as part of the boot process it will detect that the reset was caused by the watchdog and display a message etc. However in the case of LPC176x based boards when they boot they do not go straight into Marlin instead they go via the bootloader (which is typically the smoothieboard bootloader), unfortunately this intercepts the watchdog timer state and enters so called "dfu" mode. This basically starts the loaders own USB stack (which identifies itself as vendor 0x1D50, product 0x6015, which is the openmoko smoothieboard id) and waits expecting commands. So in this case you will see the Marlin USB stack disconnect (and be replaced by a smoothieboard device).

@VanessaE I did try my V1.3 board with a configuration based on yours and I can happily add in the PIDTEMPBED setting without getting any sort of error, or reboot etc. However in the case of your V1.3 board it looks like you have configured TMC2208 devices, but do not have them plugged in. Is that correct? Unfortunately my V1.3 board is in my working printer and to test this I'd need to pull it apart to pull out the driver modules which I'd rather not do. My guess is that in your case the startup period (while it tries to talk to various non-existent devices) is pretty close to the watchdog timeout) and that adding the PIDTEMPBED option is pushing the startup time over the watchdog period and triggering a reboot. The time spent trying to communicate with TMC2208 devices has been increased in recent versions of the TMCStepper library (with good reason), but false triggering of the watchdog may be an unexpected side effect of that. When you try that configuration do you see first the Marlin USB device appear and then the smoothieboard one? That would indicate that you are getting a watchdog reset (assuming you do not have WATCHDOG_RESET_MANUAL defined - can't check your config at the moment sorry).