area515 / Photonic3D

Control software for resin 3D printers
http://photonic3d.com
GNU General Public License v3.0
131 stars 115 forks source link

Find and implement a serial proxy for advanced serial debugging #288

Open kloknibor opened 7 years ago

kloknibor commented 7 years ago

Hi,

I had an long print on a printer. It should have runned for 5,5 hours but in the morning the print wasn't finished and was stalled.

After some investigation we found out that it wasn't a memory issue and java was still alive, server was still running and could use that but no movement at all.

As you can see in the logs the printer just stopped around 5 in the morning without any further notice...

LogBundle (4).zip

WesGilster commented 7 years ago

Test Post. Is github working?

kloknibor commented 7 years ago

it is now :)! Yesterday it had a small outage ;)!

WesGilster commented 7 years ago

Sorry, I thought I posted all of this yesterday and it seems to be gone.

Yep, memory looks ok.

Heap
 def new generation   total 157312K, used 42416K [0x53e00000, 0x5e8a0000, 0x5e8a0000)
  eden space 139904K,  30% used [0x53e00000, 0x56731ac8, 0x5c6a0000)
  from space 17408K,   1% used [0x5c6a0000, 0x5c6da700, 0x5d7a0000)
  to   space 17408K,   0% used [0x5d7a0000, 0x5d7a0000, 0x5e8a0000)
 tenured generation   total 349568K, used 197749K [0x5e8a0000, 0x73e00000, 0x73e00000)
   the space 349568K,  56% used [0x5e8a0000, 0x6a9bd6f0, 0x6a9bd800, 0x73e00000)
 Metaspace       used 26827K, capacity 27916K, committed 27952K, reserved 27952K

Below is what I got from the attached logs. Printing seemed to be working quite nicely until 2016-11-15 13:07:29. At that point communications with the printer was interrupted in some way. Photonic 3D has a built in 2 minute timeout that kicks in and the printer failed to respond until we gave up for a response to the gcode until 2016-11-15 13:09:29. At this point it ignores the unresponsiveness and continues on. At this point, every gcode sent to the printer is not acknowledged with any sort of response. This 2 minute timeout continues on until you discovered the problem at 2016-11-16 05:48:18

2016-11-15 13:06:55,281 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Write 0: M17
2016-11-15 13:06:58,896 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] lineRead: ok M17
2016-11-15 13:06:58,897 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Sleep:1500
2016-11-15 13:07:00,398 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Sleep complete
2016-11-15 13:07:00,399 INFO o.a.r.j.AbstractPrintFileProcessor [PrintJobProcessorThread-2] { "layer": 118, "exposureTime": 25000, "liftDistance": 10.275226163913596, "liftSpeed": 22.996984481152055 , "layerAreaMM2": 3850.150775942397 }
2016-11-15 13:07:00,423 INFO o.a.r.j.AbstractPrintFileProcessor [PrintJobProcessorThread-2] ExposureStart:1479215220423
2016-11-15 13:07:28,990 INFO o.a.r.j.AbstractPrintFileProcessor [PrintJobProcessorThread-2] ExposureTime:28566
2016-11-15 13:07:29,049 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Write 0: G1 Z10.285 F22.861
2016-11-15 13:09:29,104 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] lineRead: null
2016-11-15 13:09:29,105 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Write 0: G1 Z-10.185 F180

This goes on for hours

2016-11-16 05:36:49,855 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] lineRead: null
2016-11-16 05:36:49,856 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Sleep:1500
2016-11-16 05:36:51,356 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Sleep complete
2016-11-16 05:36:51,357 INFO o.a.r.j.AbstractPrintFileProcessor [PrintJobProcessorThread-2] { "layer": 270, "exposureTime": 25000, "liftDistance": 9.335881312845817, "liftSpeed": 35.5215824953891 , "layerAreaMM2": 3223.9208752305453 }
2016-11-16 05:36:51,405 INFO o.a.r.j.AbstractPrintFileProcessor [PrintJobProcessorThread-2] ExposureStart:1479274611405
2016-11-16 05:37:19,964 INFO o.a.r.j.AbstractPrintFileProcessor [PrintJobProcessorThread-2] ExposureTime:28559
2016-11-16 05:37:20,035 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Write 0: G1 Z9.301 F35.982
2016-11-16 05:39:20,085 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] lineRead: null
2016-11-16 05:39:20,087 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Write 0: G1 Z-9.201 F180
2016-11-16 05:41:20,093 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] lineRead: null
2016-11-16 05:41:20,094 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Write 0: M17
2016-11-16 05:43:20,098 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] lineRead: null
2016-11-16 05:43:20,099 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Sleep:1500
2016-11-16 05:43:21,600 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Sleep complete
2016-11-16 05:43:21,601 INFO o.a.r.j.AbstractPrintFileProcessor [PrintJobProcessorThread-2] { "layer": 271, "exposureTime": 25000, "liftDistance": 9.30137669421633, "liftSpeed": 35.98164407711559 , "layerAreaMM2": 3200.9177961442206 }
2016-11-16 05:43:21,669 INFO o.a.r.j.AbstractPrintFileProcessor [PrintJobProcessorThread-2] ExposureStart:1479275001669
2016-11-16 05:43:50,237 INFO o.a.r.j.AbstractPrintFileProcessor [PrintJobProcessorThread-2] ExposureTime:28567
2016-11-16 05:43:50,310 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Write 0: G1 Z9.263 F36.493
2016-11-16 05:45:50,319 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] lineRead: null
2016-11-16 05:45:50,320 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Write 0: G1 Z-9.163 F180
2016-11-16 05:47:50,367 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] lineRead: null
2016-11-16 05:47:50,368 INFO o.a.r.g.GCodeControl [PrintJobProcessorThread-2] Write 0: M17

My guess is that most printing software handles this problem in a much less strict way. First off, I've built my theory from working on a home automation system where I was in control the serial communications on both sides. This ends up being a pretty reliable situation and I've run months at a time without interruption.

So we've got a few options. Feel free to pick as many as you want from below:

  1. Ignore it - I don't suggest we do this as I don't believe in "flukes".
  2. Make the timeout value configurable so you don't have to wait for 2 minutes.
  3. On a timeout we dump all serial line configuration information.
  4. On a timeout we perform an automatic diagnostic dump.
  5. On a timeout we restart our serial communications and become a little more pessimistic about the printers ability to recover.
  6. I'll see if I can find a serial proxy application that will attach itself between the printer firmware and Photonic3D. Then we test with the middle application to determine if the problem is in the printer firmware or JSSC.

Most of my serial programming experience has been with me developing both the embedded side(c++ Arduino) and me developing the "host" side(usually Java). In those cases I can generally achieve stability without having to down the connection, but we don't really have a ton of visibility into the firmware for this printer. All of this should be pretty easy to do, or if you'd like to something additional; feel free to propose...

kloknibor commented 7 years ago

Hi Wes,

I wasn't here at all and so was nobody at 5:47 I'm more of an evening person :P So I have no idea why the logs stopped, the times should be correct since they are connected to the internet :/

I think I would prefer 2, 5 and 6. After we restarted our serial communication we can continue printing from the point it became unresponsive correct?

The thing that I like 2 and 5 the most is that with testing here we need to restart the printer sometimes needs to be unplugged for a second to change a part and after that I have to restart the pi which takes quite long, So a bit unrelent reason + I think our firmware should definitely pickup a restart of the serial in hopes to continue printing without failure ;)!

If it would help, I have the firmware available for you, uncompiled ofcourse ;)! I think I already sended it a while ago (nearly identical to the LC10 I only changed some small variables) but I can resend it if that helps :)!

Kind regards

WesGilster commented 7 years ago

I've implemented these properties on the MachineConfiguration of the printer setup. I still need to test these features on my Raspberry Pi.

I haven't implemented 6 yet because I haven't found a serial proxy yet. I'll have to look into this later.