MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.34k stars 19.26k forks source link

sBase Serial Printing Pauses has to be force continued. #11315

Closed forkoz closed 6 years ago

forkoz commented 6 years ago

Description

Prints freeze in place after G92 E0. Prints have to be force continued from there. This occurs randomly, usually on prints >20 minutes.

Steps to Reproduce

Start a print and if its long enough the issue will occur.

Expected behavior: [What you expect to happen] Prints continue as normal.

Actual behavior: [What actually happens] Print freezes, Ok being the last message.

Additional Information

I've tried commenting out the serial responses from report position. I've tried enabling Xon/Xoff and a 1024b serial buffer.

Also.. serious issue with the kill command... when the printer requires hard power off and hits all lights on the extruder fan stops. This means if I don't catch it plastic will solidify in the nozzle.

Unsure what else to do. Try a different toolchain?

Platformio tool-scons @ 2.20501.4
tool-unity @ 1.20403.0 contrib-pysite @ 0.3.0 tool-pioplus @ 1.3.3 nxplpc @ 3.3.1 toolchain-gccarmnoneeabi @ 1.70201.0

error:

SENT: G92 E0 READ: ok SENT: G1 X119.790 Y128.800 E0.1143 F4200 READ: ok READ: ok Emergency stop issued! Attempting to reconnect... SENT: M112 READ: Error:Printer halted. kill() called! Total build time: 12.58 minutes Disconnected. Attempting connection at \.\COM8... Testing plaintext communication protocol... Permissions error connecting to port Is there another application already connected to this port? Testing binary communication protocol... Testing alternate communication protocols... Connection failed. Attempting connection at \.\COM9... Error opening selected port.

*I hit kill and not force continue on this log...

Update: Had to continue 6x in a 1.5 hour print. Another strange thing is I had to disable TX and RX buffer, otherwise the board crashes (lights on) when Y axis moves.

Marlin.zip Gcode: Resetbutton.zip

forkoz commented 6 years ago

So the issue seems to have gone away after I installed my gLCD which required a few changes.

  1. I moved X + Y to the aux connector. Pins 0,2 0,3
  2. I was forced to enable the second serial port #3 to use these pins.
  3. I commented out RX buffer and Xon/Xoff but I enabled TX_BUFFER_SIZE of 128.

I tried #3 before with no change + the endstops clashing with the LCD might have been the cause of the crash. I can speculate but had no way to tell. So far I've printed 1.5hr without issue.

gloomyandy commented 6 years ago

Could you provide a little more detail of your changes? I am currently setting up an Sbase system with discount full graphics lcd. What pins (if any) clash with this?

forkoz commented 6 years ago

These are my new configs. Not sure if it was just pins 0,17-0,15 clashing since it did it with no screen hooked up and LCD2004 configured. When they actually fought it out the board would crash and endstop didn't work. So the serial flakiness could be from not having a secondary serial port configured or no TX buffer too. The other issue not mentioned is having to plug/unplug the serial port when you reboot... but I had desperately tried smoothieware and it was doing the same thing so I don't think marlin can fix that part.

What also remains is the kill command stuffing up the board and causing my fan to go off.. I have to clean my hotend already because of it. Even if one were to say they fan shouldn't be wired to PWM and 12v instead... there is no connector for that on this board. So if you get a thermal issue like you mentioned in your other bug report it will kill your fans. The switch from the screen at least kills properly.

Another funny quirk is that M500 ejects the SD card from PC, printer crashes when it reconnects. You guessed it, hotend fan goes off. This I think will be easily fixed by using the real eeprom and not the SD based one.

Marlin.zip

On a positive note, printing is pretty much working just like the AVR board after these hickups. I was able to boost infill and motion speeds like I had planned. LV steppers were a bit quieter but I could always go back if I get the breakout boards.

p3p commented 6 years ago

As usual there are a few parts of this I'm confused with

So the serial flakiness could be from not having a secondary serial port configured or no TX buffer too.

USB serial doesn't use the Xon/Xoff or the RX/TX buffer setting, (the buffers are internal to the usb stack) with some more buffer in a fifo, also the hardware ports should in no way interact with the usb port they are completely independent, but if you do enable a hardware port those pins are disabled for use elsewhere.

The other issue not mentioned is having to plug/unplug the serial port when you reboot..

I haven't seen this, your OS may be slow at realising the USB device was disconnected .. try holding the reset button for a second or 2? (I may be able to work around that in the usb stack)

What also remains is the kill command stuffing up the board and causing my fan to go off..

Isn't that just how kill works? on AVR too? It disables everything even the power supply if it has control of it.

Another funny quirk is that M500 ejects the SD card from PC, printer crashes when it reconnects

Another thing I've never seen on my Re-ARM (with GLCD) or MKS SBase, (without a display), could be linked to the shared spi, I have never had a problem with the EEPROM emulation myself.. I wish I did, USB stack crashing I have seen once during boot but cant reproduce, I'm in the process of replacing the whole thing.

On a positive note, printing is pretty much working just like the AVR board after these hickups. I was able to boost infill and motion speeds like I had planned

At least it's printing, I can feel slightly less guilty about your experience,

forkoz commented 6 years ago

Don't feel guilty. I knew there would be hickups going in. In a way its half the fun.

Didn't know buffers don't effect the USB. It was suggested to enable these things in other bug reports by thinkyhead. I guess its logical and the slow USB release makes sense. I did not have this problem whatsoever with AVR. Here whenever the printer restarts I have to plug and unplug it.

Also kill on AVR resets the printer, it doesn't lock it up to where it has to be manually reset with the kill/reset switch. At least it didn't on my Mks GenL. Thermal runaway required the printer be restarted but the fans kept spinning as much as I remember (it's been 3 weeks and I didn't have that happen too often). M112 just reset the printer as if you hit the kill button, that feature I did use. FWIW I don't have power supply control set up, its just I have to fully power down the board and disconnect USB if I ever use the emergency stop or if there is some error. Its slightly easier now that I have the kill button.

Whats happening with the eeprom is this:

  1. Save eeprom or load a mesh M503 works though.
  2. USB shared sd drive unmounts... things go on as usual for a while
  3. USB shared drive mounts again in about a minute, then printer locks up with the alarm going off and has to be reset.

It could be the fault of windows 7 too, who knows.

p3p commented 6 years ago

I did not have this problem whatsoever with AVR.

AVR are not USB devices, the boards have a USB to UART bridge chip that handles USB serial and is not reset when the AVR is, meaning the USB device doesn't disconnect.

Also kill on AVR resets the printer, it doesn't lock it up to where it has to be manually reset with the kill/reset switch

I'm pretty sure that's not standard, kill (M112) is supposed to put the board into a safe state until user interaction not restart.. http://marlinfw.org/docs/gcode/M112.html

forkoz commented 6 years ago

I kept hitting Emergency stop in S3d and that just fired off the M112 command, printer would immediately reboot. If it has to lock up the board it needs to at least leave the fans spinning. All fans on sbase are PWM fans.

Thermal runaway makes sense to halt the whole thing since you don't know what's going on.With regular emergency stop I'm not so sure, it would suck to end up with blocked nozzle because of eeprom save or some other software error, would it not?

forkoz commented 6 years ago

I turned off the buffer and the printer still works. Pausing is pointing at the LCD more and more.

However a new crash occurs. When .Gcode file is finished in S3d the SD card disconnects and printer crashes after a few. Also saving "eeprom" with USB disconnected = no crash. Then when you plug the USB in after a while it crashes too. Ditto for leaving the printer plugged in with nothing going on for a few hours. I almost did it before I went to sleep then hour or 2 later... beeeeeeeep! No usb plugged in, printer is fine idling all night.

So there is definitely some issue with the USB drive being shared to the OS. I even turned off USB power saving on all hubs in win7. I should try different PC and different OS too because why not.

gloomyandy commented 6 years ago

I have my board connected up to a display (including external and internal sd card working) and a PC but not yet to my printer. If there is a sequence of operations that can be used with my setup I'm happy to try and reproduce the problems you are seeing. I'm running Windows 10 so it may provide some more data points. Let me know if there is something I can test.

forkoz commented 6 years ago

If you can fool the board into "printing" somehow. It all looks related to exposing SD to the OS.

  1. Crash on save eeprom.
  2. Crash on load/save mesh.
  3. Crash after finish of .gcode file.
  4. Crash if printer left plugged in a while. S3d sends M105s and reads temperature at idle.

I have a relatively long usb cable but that wasn't a problem with AVR. Got an extra 10 laptop too so I should load S3d on there and see what happens since I'm all set up.

gloomyandy commented 6 years ago

Thanks, I can probably try some of those even without a printer. I'll let you know how I get on!

p3p commented 6 years ago

Help in isolating what is happening is always appreciated, OS, USB signal integrity, etc, I do use this hardware myself but for some reason do not have these issues (or any you seem to find ^^) making it hard to debug, My Re-ARM has a considerable amount of printing over the USB Serial (with EEPROM and lots of USB Mass Storage use), less testing on the SBase but its the same hardware (and code paths), From previous submitted issues the SBase does seem to be more effected by unstable USB, mine does not share that trait having been stable for a week or so, saving meshes and other EEPROM data along the way. The EEPROM code also seems fairly resistant to error even on forced failure and induced spi interference it fails gracefully not crashing.

Think I've said before that I am working on replacing the USB stack (it was supposed to be only a temporary implementation), and probably moving the EEPROM data into the flash while splitting out the framework, just need to find enough time.

forkoz commented 6 years ago

Any way to dump the error that happens to the LCD or a file on the SD card? I mean the temporary fix would just be disable the SD drive being visible from the PC which I'm going to look for as soon as I can verify its not my PC.

Wire something to J7 and you can probably reproduce the original error with the pausing. That one is totally on me which is why I made the PR.

gloomyandy commented 6 years ago

You could try "ejecting" the sd card from the PC? I think after that the PC should not be trying to talk to it, for the rest of that session?

forkoz commented 6 years ago

Worth trying that first before looking to comment it out in the code somewhere. M500 makes it eject already but it comes back and then crashes. Win10 laptop with short usb cable hooked to different power source crashed too.

p3p commented 6 years ago

On linux USB drives are not mounted automatically, although I have left it mounted a few times it's not usually, so this could be the difference. M500 auto disconnects the host because 2 systems cannot access a block level device at the same time, It does act as expected for me, Host gets kicked off, Marlin saves the data, host is allowed to mount again, not sure how it is causing the MCU to stall for you(flashing light stops and shows up in device manager as smoothieware?) but it doesn't surprise me, It needs redone correctly, if the USB drive is mounted then just disallow all access from Marlin (think that is how smoothie does it), Windows doesn't follow the MSC protocol correctly so the lock, unlock built into it can't be used which complicates matters.

forkoz commented 6 years ago

I commented out: MSC_Release_Lock() and lose access to the SD card. But still get a crash after a while regardless :(

The first comment: FRESULT res = f_open(&eeprom_file, "eeprom.dat", FA_OPEN_ALWAYS | FA_WRITE | FA_READ); //if (res) MSC_Release_Lock();

seemed to help with crashes on G29 T but not G29 L0 or saving.

p3p commented 6 years ago

I'm not sure what you were aiming for there, commenting out that line will only cause Marlin not to Release the disk lock on error opening the EEPROM data file. G29 T does not access the EEPROM as far as I can see.

Your best bet at this point is making sure not to use your printer with the usb drive mounted in windows and see if that helps, until I can figure out why you have this issue and I don't, or I finish the USB refactor and it all goes away (hopefully)

forkoz commented 6 years ago

I was aiming for marlin to leave the usb drive locked and not remount the drive.

I tried to use eject and then lose communication until I restart the printer even though the com port is still visible. My work around thus far was complete all operations before issuing M500 and then waiting for the inevitable crash :)

gloomyandy commented 6 years ago

Interesting I've now (sort of) got my printer working with the Sbase board. I use UBL with a mesh saved in "eeprom" at the start of every print (I print from the sd card attached to the display), my gcode loads the mesh. This means that whenever I start a print I see the internal sd card "disappear" from Windows and then shortly afterwards it is remounted (as Marlin disconnects it to read the eeprom file I assume). So far this has all worked fine, with no crashes or anything. I'm using Windows 10.

forkoz commented 6 years ago

Could something be wrong with my binary? I tried both 7 and 10 so its not the PC. The only other thing I can think of is that I have files on the SD card besides marlin or the card itself?

Marlin-configs.zip firmware.zip firmware-elf.zip

Even moved my endstops now off of AUX1 and onto J8 in case serial3 conflicts with SD. Still crashing. When I edited HW serial to not assign uart, pins or IRQ to serial_port_2 I get disconnect from PC after loading mesh the first time with G29 and then crash the second time.

gloomyandy commented 6 years ago

Ok so I think I've just seen my first crash with this. I was printing from the LCD SD card but with a laptop hooked up via USB and Repetier host connected (mainly to monitor the temperatures). The print was going fine but I left it for a few minutes and when I came back it had stopped mid print. I couldn't do anything using the LCD controller to wake it up other than hit reset. I think that what happened was that my laptop had decided to power off (as I hadn't been using it) and so the USB connection was dropped and I think this caused the stop.

Not really a big issue for me, more just another data point. Looking forward to trying the new USB stack when it is available!

forkoz commented 6 years ago

So here is some more weirdness.

  1. I made modifications to stop serial2 being taken automatically in HAL, pinmapping &hardwareserial.cpp. After that when serial_port2 is left undefined, pins in aux1 still won't work for endstops. Something with the bootloader or maybe something I missed?
  2. P2_13 doesn't work for end stop, won't detect open/close. P2_8 right next to it works great
  3. P2_12 needs endstop closed at boot or board is stuck frozen with 4 leds on, p2_11 right next to it works great. Nothing on these pins in the schematic.
gloomyandy commented 6 years ago

Which schematic are you using on the one I was able to find (which is for the 1.2 board and is here https://github.com/pixel3design-hub/MKS-SBASE-FULL-DOCUMENTED) it shows pin P2_13 to be the E1_DIR pin and P2_8 to be E1_STEP have you disabled E1? P2_12 has been working fine for me as part of the work around to provide access to the SD reader in the LCD.

Is there a reason for using alternate pins for the endstops (so far the standard ones although not interrupt based seem to be working fine for me)?

forkoz commented 6 years ago

Yes, I'm going off that too. I don't have an E1 so it was never enabled and yes I am hunting for interrupt capable pins for end-stops. But turning that feature off didn't stop the crashes so I don't think they are related.

I thought about hooking up my TTL usb to the serial port but realized marlin only outputs whatever is in the code to the console. Maybe hooking up a JTAG would help me find out WTF is going on, I'm at a loss.

p3p commented 6 years ago

P2_12 needs endstop closed at boot or board is stuck frozen with 4 leds on, p2_11 right next to it works great. Nothing on these pins in the schematic.

P2_12 is the bootloader ISP pin, it puts the bootloader directly into DFU mode, the same thing that happens when a watchdog reset happens.

There is nothing special about P2_13 though, not sure why that wouldn't work for interrupts.

forkoz commented 6 years ago

So stepper switch opened was tripping DFU mode. Should I try to turn the watchdog off and see if the firmware is really hanging?

forkoz commented 6 years ago

Without watchdog. Printer disconnects drive, drive comes back, serial disconnects. Re-plugging usb re connects the printer. Never crashes though. I can also disable/enable the com port in device manager to reconnect.

With watchdog reset manual, same behavior as having watchdog enabled.

forkoz commented 6 years ago

I got rid of SD card crashing by getting rid of all: MSC_Aquire_Lock(); MSC_Release_Lock();

SD still saves fine but no longer trips the watchdog. I still had issues on really long prints like 4-5hrs where the USB would disconnect, printer would stop with heaters on and just burn in one spot until reconnected.

p3p commented 6 years ago

@forkoz just a heads up but without those locks there is a chance for filesystem corruption, that's what they 'try' to protect against as a filesystem should never be mounted by 2 operating systems.

forkoz commented 6 years ago

I know, but I'd rather deal with this than the crashing. I notice that most functions for this and USB are in CMSIS/LPC1768, I think the read/write will fail if its locked by the host. The only other thing messing with USB could be the hardware serial implementation always setting things up in the begin function. I'll see if I get any more disconnects after this.

so printer stayed connected to the PC overnight and didn't crash or disconnect.

gloomyandy commented 6 years ago

The above PR may be of interest as it removes the need to access to SD card for parameter storage.

forkoz commented 6 years ago

Yep, printer has been stable now with the locks removed. I will definitely try the flash memory method too.

It seemed to crash before even sitting so the locks may still need to stay removed.

forkoz commented 6 years ago

So super stable with those locks are gone and working with the sd based eeprom very well.

I also did:

  #if NUM_SERIAL > 0
    if (SERIAL_PORT !=-1) MYSERIAL0.begin(BAUDRATE);
    #if NUM_SERIAL > 1
     if (SERIAL_PORT_2 != -1)  MYSERIAL1.begin(BAUDRATE);
    #endif

in main.cpp but not sure it did anything.

p3p commented 6 years ago

unfortunately I cant knowingly remove safeguards against file-system corruption, so the easiest solution will be to move to the flash based eeprom implementation, I still don't know why the locks cause so much problem with your setup and not others.

Begin on the usb serial port is just a compatibility stub it does nothing, USB CDC devices don't have a baudrate, they run at native USB speeds when not using an external chip like most Arduino boards.

forkoz commented 6 years ago

That's why I didn't make a PR. I didn't see anyone else complaining about it so I figured it wrong to foist removing the safeguards on everyone. Dunno what makes my board and setup unique here.

I looked at the begin function and it does things outside of the port defines. Whether they have any impact I don't know. I did it "just in case". Printer has been on for 3+ days and printing over and over without a single crash or disconnect.

The flash based eeprom is a good idea, just don't delete the option to use the SD card because I rather like it now that it works. I think we can chock this up to the locks tripping the watchdog and disconnecting the USB. What I don't know is how they were affecting the prints where I never saved the eeprom... maybe the initial load caused the instability? Would they also fire when the eeprom was read on first boot?

gloomyandy commented 6 years ago

I wonder if an alternative might be to disable access to the SD card from USB unless explicitly enabled via say a menu command (or even G Code), or maybe restarting the system with a controller button pressed (or something!). After all at the moment the only real use for USB access is to copy a new version of the firmware to it, which is a pretty rare event for most users. At least that's all I use it for.

p3p commented 6 years ago

I looked at the begin function and it does things outside of the port defines. Whether they have any impact I don't know. I did it "just in case". Printer has been on for 3+ days and printing over and over without a single crash or disconnect.

The definition of begin() for the usb serial is:

void begin(int32_t baud) { }

doesn't do much ^^

The flash based eeprom is a good idea, just don't delete the option to use the SD card because I rather like it now that it works.

I'm not sure the sd card implementation adds anything other than complications, pretty much every issue I get is about it (or flaky USB), but it may be left as an option.

What I don't know is how they were affecting the prints where I never saved the eeprom... maybe the initial load caused the instability

I'm not sure, unless you actually call the gcode command to save to EEPROM the lock is never used during operation. Windows may be accessing the card at any time depending on how it feels, this is expensive and potentially cause instability though it has a very low interrupt priority and shouldn't interfere with step generation.

@gloomyandy my intention is to make the drive available to Marlin for gcode if mounted and only available to the host if it not mounted (and not printing), it is a waste atm as Marlin cannot access the system card, you shouldn't really print with card mounted in the host as accessing the card is time consuming (although low priority).

forkoz commented 6 years ago

Are you sure the lock never happened on read? bool access_start() called the locks.

I thought begin would call void HardwareSerial::begin(uint32_t baudrate), I never even found that one in serial.h till now.

forkoz commented 6 years ago

Something is still occasionally tripping the watchdog. I get disconnected (still heating) printer or alarm on in the morning. I'm going to try the flash eeprom and see if it makes any difference. No error messages on the screen or in the log so I have no idea how to troubleshoot it.

The flash eeprom knocks about 80kb off the binary, which is nice.

gloomyandy commented 6 years ago

Do you have a PC connected to the printer? Is there any chance that the PC disconnected? I've seen this a few times now the PC disconnects (or even just the application closes the "serial" connection) and Marlin locks up with a watchdog timeout.

I've been doing some work on the current code to allow the use of a standard cable with both SD cards active (so sharing hardware SPI), the built in card seen by the PC over USB and the external card by Marlin, so pretty much as now but without the need for a special cable. I finally seem to have this working pretty well, though I'm still testing it. I may try to extend it so that both Marlin and the PC share the same sd card (with locks similar to those used by the DUE setup), but I'm not sure. In the process of getting this working I've seen a few issues which I think I've fixed which will hopefully make things more stable. I need to clean things up a bit and will then produce a PR for the basic changes (before trying to do the shared sd card thing).

p3p commented 6 years ago

Do you have a PC connected to the printer? Is there any chance that the PC disconnected? I've seen this a few times now the PC disconnects (or even just the application closes the "serial" connection) and Marlin locks up with a watchdog timeout.

I hadn't noticed this until recently but there does seem to be a problem with the disconnection detection, the transmission loops all depend on boolean value (set in an interrupt) to break them if a disconnect happens, if it isn't been set then a watchdog reset will happen because the buffer can't empty.

Look forward to seeing your PR,

forkoz commented 6 years ago

Yes, I always print from PC. I had an interesting experience today. PC was disconnecting or blue screening.

I had 4 failed prints total between last night and today. On a 6 hour print about...

  1. PC disconnect "firmware unresponsive" - no watchdog
  2. PC bluescreen - no watchdog reset
  3. PC disconnect "firmware unresponsive" + watchdog
  4. PC bluescreen, bluescreen on every reboot until the printer was unplugged and reset

I also see some: The driver detected a controller error on \Device\Harddisk10\DR18. An error was detected on device \Device\Harddisk10\DR16 during a paging operation. errors in event log.

So right now I disabled USB MSC will check if it happens again. No real pattern or anything since it ran fine for a long time.

thinkyhead commented 6 years ago

You may want to try lowering the BAUDRATE setting in Marlin to 115K if not done already. And, you could try messing around with your PC's serial port settings, which are buried a few tabs down in the device manager, to see if various buffer sizes. And of course, try some better-quality shielded USB cables.

forkoz commented 6 years ago

Fixed in the latest merges.

github-actions[bot] commented 4 years ago

This issue has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.