aegean-odyssey / mpmd_marlin_1.1.x

a fork of Marlin firmware (bugfix-1.1.x) for the Monoprice MP Mini Delta 3d printer
GNU General Public License v3.0
76 stars 19 forks source link

Hesitation/Reboot with USB plugged in but no software connection open #48

Closed mulcmu closed 3 years ago

mulcmu commented 3 years ago

I have encountered hesitation with the printer motion printing from SD card whenever the USB cable is physically attached to laptop and no software on laptop has the port open. Normal operation is resumed by unplugging the cable or opening the serial connection on the laptop. While recording video to post, a reboot was encountered and maybe similar to #47. This behavior was repeatable.

Troubleshooting consisted of trying different usb cables, different port on laptop, and different computers.

Easy work around for this issue but not the expected behavior.

https://user-images.githubusercontent.com/10102873/103359377-d0da3800-4a85-11eb-95b5-3d663cdbb34a.mp4

aegean-odyssey commented 3 years ago

Oh my. Before disappointment sets in on me completely, let me thank you for your efforts and assesment. Alas, I think we've a serious bug here.

The timer, USB, and serial (LCD display), and endstop interrupts have priority over the stepper and temperature interrupts. When active, the usb transmits on a 1ms tick and flushes up to 64 bytes at a shot. On the filling the buffer side, if the machine fills the 128-byte transmit buffer, it backs off for 10ms when trying to add the 129th byte. So, it really shouldn't ever get "stuck".

The fault seems to be a watchdog timeout. With no red LED, it means that the timeout occurred while the CPU was happily executing code -- along the infinite loop variety.

The watchdog timeout is set at about 4 seconds which is plenty of time for the machine to do what it needs to do. So, I'm thinking that the problem is some kind of excessive interrupt activity.

I tracked down a similar crasher to an overrun error flag in the serial port for the LCD display. All indications pointed to a problem with the USB code, but it was the serial port interrupt that was in a tight loop.

This time I'm thinking that there is a problem with how the USB code connects and disconnects -- perhaps the code does not detect a dropped connection (it's suppose to). I'm not up on my USB protocols, but there seems to be a bit of negotiating and handshaking going on whenever USB establishes a connection -- I don't what's suppose to happen when a connection suddenly goes missing.

Anyway, I'll see what I can find.

mulcmu commented 3 years ago

A few other observations that might be relevant:

aegean-odyssey commented 3 years ago

Thanks. Does seem like it's related to a full transmit (outgoing from printer) buffer. As I recall, Marlin4MPMD uses larger buffers, and a (what seemed at the time, heavy handed) throttling mechanism. And there were comments concerning USB frustration there, too. Long standing issue #5 also may be related. I'm pretty sure it's getting stuck or bogged down in a loop somehow. Now to find it.

Also, the USB interrupt stays in a tight loop until all pending (USB) interrupts are serviced -- very much not the way I would do things, but my "don't loop in the interrupt" rewrite could never establish a connection. So, perhaps the CPU can't respond fast enough otherwise.

Twenty-five (25) M503s -- generates a ton of output with very little on the receiver, and little in the way of a stepper interrupt. Good test. What a USB cable attached for this test?

mulcmu commented 3 years ago

Above testing was performed in 3 different states with the M503 output flood: USB unplugged from printer, USB connected to printer and linux laptop port closed, USB connected to printer and linux laptop port opened. Normal behavior with the USB unplugged or connected with port opened. Reboot was consistently encountered when cable connected with port closed.

I did some more testing with a Windows 10 laptop that seems to support your suggestion something is awry in the negotiating and handshaking going on whenever USB establishes a connection. This was the first time this laptop was connected to printer, so the STM32 Virtual COM port driver were not installed. win 10, no drivers loaded The reboot was not encountered in this condition with the M503 output flood. (USB cable connected to the windows 10 laptop and printer, port not opened, but the proper windows drivers not installed.) After installing the STM32 drivers from here the reboot behavior occurred on this laptop when cable was connected but the port was not opened in software.

aegean-odyssey commented 3 years ago

Thanks. Does seem like it's related to a full transmit (outgoing from printer) buffer, but I'm not sure. As I recall, Marlin4MPMD uses larger buffers, and a (what seemed at the time, heavy handed) throttling mechanism. I'll look at Marlin4MPMD for some insight. STM's USB driver and HAL in general are a bit of a mess (mostly because it's trying to be "all things to all people"), so it's where most of my suspicions lay.

In your Windows test, the string, "STM32 Virtual ComPort in FS Mode" (Other Devices), is sent by the firmware, I believe, so at least there's some initial handshaking on the USB port. I think things are pointing to the low-level USB driver is getting stuck, constantly interrupting until the watchdog timer reboots the machine. I suspect what sets up the situation is queuing up a USB packet to go out when the USB port is not really open, but I've not pinpointed the scenario in the code, yet.

mulcmu commented 3 years ago

Looks like this stackoverflow discussion might be a good lead.

aegean-odyssey commented 3 years ago

I think you're right.

The firmware all ready implements some of the strategies discussed and uses a later version of STM USB library with supposed fixes, but I'm a little leary about the "correctness" of STM's TxState flag. I added it some time ago and it seemed to work, but since it resides in the transmit (SysTick) interrupt, it can interrupt the USB interrupt, so TxState may not be valid when tested. I may need to follow the locking mechanism the HAL uses in its code. Or check the ep0_state instead.

I'm thinking, too, that I should add code at a higher level that closes and re-initializes the connection when it detects a problem. This would be when sending a byte, if the TX buffer is full, the firmware waits for 10ms then tries again. If the buffer is still full the byte is dropped. I could try to re-initialize the connection at this point. Only thing is, it may be too late. The machine is all ready "stuck".

aegean-odyssey commented 3 years ago

@mulcmu , I was able to reproduce the problem here using your "many M503s" test.

The sequence:

Quite reproducible.

With the latest changes, I can repeatedly run "print job" connected or disconnected to the terminal program.

I wish I could identify a cause of the problem more specifically. It is particularly nagging, since the Marlin4MPMD firmware seems to avoid the issue. So as much as I'd like to blame STM's HAL/USB libraries, it's certainly not the entire story. I've been going over the Marlin4MPMD code and the USB implementations are very, very similar.

A few of the differences, Marlin4MPMD:

I think the main issue is that mpmd_marlin_1.1.x would pound away with output to a closed USB connection -- expecting USB 's error handling protocols to accommodate. And from a little reading over the past few days, that was not a good design choice on my part.

Hopefully, the problem's fixed.

mulcmu commented 3 years ago

The changes in the 119r15 release have resolved the observed hesitation and reboots. No other side usb related side effects observed after a week of usage.

aegean-odyssey commented 3 years ago

Good to hear. Thanks for your help -- instrumental in resolving this one.