Closed ghost closed 8 years ago
@cojarbi do you know what OS Chirag was using?
So far you and @athulsnair have the following in common:
How to debug this:
The following instructions will help you turn on debugging. But dont just blindly post logs. Please try and find the issue in your situation first (smoothie doesnt have this, Lasaurgrbl neither - so we need to determine what trips you up...)
Test these one by one
This prints everything the controller says, to the terminal - may have some clues there (pasting me a log of a full job run would be helpful, we may be parsing resend requests incorrectly)
This disables checking the position feedback every second, see if maybe the extra commands while there are commands in the queue, trips it up... (maybe the controller doesnt understand the "?" we send each second to get data back
@cojarbi: I said OS (operating system - mac or something else) not firmware... turnkey is firmware
Reading over the docs, and @grbl @arthurwolf, please advise here too, but on detecting an Error, since gcode follows along on each other - should we resend last command, abort the job? What is the correct course of action when grbl says "error:" mid job?
I suspect the lines @athulsnair was showing here: (top image, same file sent with another host, and below sent with Laserweb = indicating error in laserweb/serialport)
Was:
@chamnit - your advise too please (: ? We are battling with Laserweb (host for amonst others Grbl) handling error: - what to do?
(lets ignore why the command gets mangled in the first place - i suspect we need to get @voodootikigod involved for that one (; )
@openhardwarecoza : The correct protocol would be to stop the job upon an error, but most GUIs just ignore them and keep going. The latter is dangerous, as this can make the CNC move in the most unexpected ways and likely break something. Also, if you are having communication errors, there will always be a chance that a mangled command becomes accepted by Grbl, because it's valid but not correct. For example, 'G1 X10.03' could easily become 'G1 X1003' by missing the period.
It looks like you are getting some data corruption somewhere between the GUI and Grbl. And I would concentrate your efforts on finding out where this is coming from. Historically, the most common solutions were making sure that the character counting protocol buffer is 127 bytes, not 128 (minus one due to ring buffer. always has been this way but will be fixed in v1.0), and electrical noise somewhere near the USB cable that is corrupting or disconnecting Grbl. I've also seen instances of chinese USB-serial chips, unlike the FTDI or Atmega16U, have corruption issues when a lot of data is sent back and forth. The buffer in them isn't big enough.
You definitely want the machine to stop in case of an error. Tell the user something is wrong, and he should fix the problem, not keep trying to use a broken machine ...
On Fri, Jan 29, 2016 at 5:19 PM, Sonny Jeon notifications@github.com wrote:
@openhardwarecoza https://github.com/openhardwarecoza : The correct protocol would be to stop the job upon an error, but most GUIs just ignore them and keep going. The latter is dangerous, as this can make the CNC move in the most unexpected ways and likely break something. Also, if you are having communication errors, there will always be a chance that a mangled command becomes accepted by Grbl, because it's valid but not correct. For example, 'G1 X10.03' could easily become 'G1 X1003' by missing the period.
It looks like you are getting some data corruption somewhere between the GUI and Grbl. And I would concentrate your efforts on finding out where this is coming from. Historically, the most common solutions were making sure that the character counting protocol buffer is 127 bytes, not 128 (minus one due to ring buffer. always has been this way but will be fixed in v1.0), and electrical noise somewhere near the USB cable that is corrupting or disconnecting Grbl. I've also seen instances of chinese USB-serial chips, unlike the FTDI or Atmega16U, have corruption issues when a lot of data is sent back and forth. The buffer in them isn't big enough.
— Reply to this email directly or view it on GitHub https://github.com/openhardwarecoza/LaserWeb/issues/94#issuecomment-176842620 .
Courage et bonne humeur.
Thanks Arthur! Thats exactly what I am NOT doing... So lets fix this by stopping the machine right away (a dead stop is an easier thing to fix than random moves!
Appreciate the input!
@chamnit agreed! Cheap arduino clones has been known to cause hell (; - its like one of the first questions I ask when people request support here (:
We are using the slower ok/error ping pong buffer (simply because it works with everything, and I try to support everything) without needing a tonne of extra if firmware ==? code (:
like I said above, we'll debug the error (just to make sure its not in Node SerialPort (the two tests above) - but I needed your clarification too on whether to stop, or continue! (In retrospect I don't know why i wondered about it. Maybe because GrblWeb continues... But it makes a lot more sense to stop...
@athulsnair - git pull to update, then test with this:
If still failing the job will now abort instead of making funny movements. If it does, refer to https://github.com/openhardwarecoza/LaserWeb/issues/94#issuecomment-176763191 for instructions on helping me debug it
Escalated the Corrupt data to the upstream node SerialPort project
Just tried the latest commit with .9g and it will not run even the first line. Mind this is running on an RPI connected to UNO ( oficial ) / Gshield
This is the gcode section where it stoped 9 G21 10 G90 11 G1 F2000 12 G0 F6000G0 Y11.4 13 G0 X0.0 Y11.4 S0 14 M3 15 G1 X48.0 Y11.4 S201 F2000 16 M5 (this is 3 more lines than where it stoped. why do we need a S0 for grbl ?)
This is from the console window SEND: G21 SEND: G90 SEND: G1 F2000 SEND: G0 F6000G0 Y11.4 error: Invalid gcode ID:24 Aborted Job! Safety First
This is from terminal ERROR: Error from machine: error: Invalid gcode ID:24 Aborted Job - safety first!
Will run debugging next as suggested above
I should have checked this earlier. The Grbl error ID 24 indicates that there were two commands sent in the line, which both require the XYZ axis words. So this is undefined. That last line has two G0 commands. This is an invalid command.
I'm also unsure why there is an F6000 with a G0, because this is also undefined. G0 always runs at the maximum speed of the machine. It will always ignore the feed rate. F should only be used with G1/2/3/38.x and any other motion feed command.
Serial rcv debug shows on the same spot: Port 0 got newline from serial: error: Invalid gcode ID:24 ERROR: Error from machine: error: Invalid gcode ID:24 Aborted Job - safety first!
@cojarbi : Let me re-iterate. The problem is the g-code. Not the sender. Line 12 of your g-code program is an invalid g-code block, because it has two G0 commands in it. Remove the second G0 and it should work.
Sneaky little G0. What r u doing there
Good eye! @nathanielstenzel and @mostley (; See ^^^
I removed the G0 Fxxx from the start gcode.
Have another go?
Misclicked close button sorry!
@cojarbi , i like seeing this:
"Serial rcv debug shows on the same spot: Port 0 got newline from serial: error: Invalid gcode ID:24 ERROR: Error from machine: error: Invalid gcode ID:24 Aborted Job - safety first!" Means the abort code I pushed a few mins ago seems to be doing what we wanted it to! Safety first! No more runaway machines
Yes good thing. I almost ruined my leather couch on one of those runaway. Just kidding
At least it wasnt an eye :8ball: @funinthefalls hows our disclaimer by the way (;
Will write one up today and post it so that no one can miss it. Sorry I haven't been much help this week, been in and out of the hospital for the last 5 days.
Disclaimer posted in Readme.md and Wiki home page.
Aww man! Sorry to hear! Hope you are doing better!
And a extra big thank you for posting one (:
Let me know if theres anything i can help with!
I still haven't made up the schematic for the safety system, if you have a few minutes to bang one out I would appreciate it.
New errors found
Console SEND: M3 SEND: G1 X43.3 Y8.8 S40 F4080 SEND: M5 SEND: G0 X9.2 Y8.8 S0 SEND: M3 SEND: G1 X9.1 Y8.8 S40 F4080 error: Bad number format Aborted Job! Safety First!
Terminal Port 0 got newline from serial: error: Bad number format ERROR: Error from machine: error: Bad number format Aborted Job - safety first!
In a previous attempt it issued error ID 25. I did not log it since i thought it would replicate
I suspect Grbl likes the S param next to M3, not inline with the G1?
Let me add a fix and then you can test that too
@openhardwarecoza : The S word can be sent at any time, like the F word. I'm not sure what the problem above is about. A 'bad number format' error indicates that the value Grbl received is not something it can parse. Like a floating point value with two periods in it.
FWIW, Grbl does have an echo feature, where it'll report back the gcode (pre-parsed with no spaces and capitalized) it sends to the g-code parser. That has to be enabled in config.h and re-flashed. It was added as way to help debug situations like this. Perhaps, you could modify your sender should accept this feedback to check against what you sent.
Ok good! Because removing S from the G1 line would break the 3D viewer
@cojarbi can you enable the debug echo in grbl and reflash? Or should I build you a hex?
Upload the full the gcode from that previous job too please?
Okay I dug out a Uno (didnt know I still had one) and flashed stock grbl hex.
Random lady picture rastered:
unpausing queue Job Started Last written command was G21 Last written command was G90 Last written command was G1 F2000 Last written command was G0 Y245.0 Last written command was G0 X0.0 Y245.0 Last written command was G0 X178.2 Y245.0 Last written command was M3 S25 Last written command was G1 X180.8 Y245.0 F4440 S25 Last written command was M5 Last written command was G0 X197.8 Y245.0 Last written command was M3 S25 Last written command was G1 X199.1 Y245.0 F4440 S25 Last written command was M5 Last written command was G0 X204.4 Y245.0 Last written command was M3 S25 Last written command was G1 X205.7 Y245.0 F4440 S25 Last written command was M5 Last written command was G0 X207.0 Y245.0 Last written command was M3 S25 Last written command was G1 X208.3 Y245.0 F4440 S25 Last written command was M5 Last written command was G0 X239.7 Y245.0 Last written command was M3 S25 Last written command was G1 X241.0 Y245.0 F4440 S25 Last written command was M5 Last written command was G0 X262.0 Y245.0 Last written command was M3 S25 Last written command was G1 X263.3 Y245.0 F4440 S25 Last written command was M5 Last written command was G0 X352.4 Y245.0 Last written command was G0 Y243.7 Last written command was G0 X352.4 Y243.7 Last written command was G0 X271.2 Y243.7 Last written command was M3 S25 Last written command was G1 X269.9 Y243.7 F4440 S25 Last written command was M5 Last written command was G0 X267.2 Y243.7 Last written command was M3 S77 Last written command was G1 X265.9 Y243.7 F3720 S77 Last written command was M5 Last written command was M3 S153 Last written command was G1 X264.6 Y243.7 F2640 S153 Last written command was M5 Last written command was M3 S179 Last written command was G1 X263.3 Y243.7 F2280 S179 Last written command was M5 Last written command was M3 S204 Last written command was G1 X262.0 Y243.7 F1920 S204 Last written command was M5 Last written command was M3 S179 Last written command was G1 X255.5 Y243.7 F2280 S179 Last written command was M5 Last written command was M3 S204 Last written command was G1 X231.9 Y243.7 F1920 S204 Last written command was M5 Last written command was M3 S179 Last written command was G1 X230.6 Y243.7 F2280 S179 Last written command was M5 Last written command was M3 S204 Last written command was G1 X221.4 Y243.7 F1920 S204 Last written command was M5 Last written command was M3 S230 Last written command was G1 X220.1 Y243.7 F1560 S230 Last written command was M5 Last written command was M3 S204 Last written command was G1 X217.5 Y243.7 F1920 S204 Last written command was M5 Last written command was M3 S230 Last written command was G1 X214.8 Y243.7 F1560 S230 Last written command was M5 Last written command was M3 S204 Last written command was G1 X204.4 Y243.7 F1920 S204 Last written command was M5 ERROR: Error from machine: error: Modal group violation Aborted Job - safety first! ERROR: Error from machine: error: Bad number format Aborted Job - safety first!
I don't see any modal violations, nor do I see a number format error.
So maybe, time to look over https://github.com/voodootikigod/node-serialport/issues/663
Right, I can reliably reproduce the issue with echo on:
WARN: Ignored: Port /dev/ttyACM0 said: [echo: G1X149.0Y183.3F4440S25] WARN: Ignored: Port /dev/ttyACM0 said: [echo: M5] WARN: Ignored: Port /dev/ttyACM0 said: [echo: G0X152.9Y183.3] WARN: Ignored: Port /dev/ttyACM0 said: [echo: M3S25] WARN: Ignored: Port /dev/ttyACM0 said: [echo: G1X153.9Y183.3F4440S25] WARN: Ignored: Port /dev/ttyACM0 said: [echo: M5] WARN: Ignored: Port /dev/ttyACM0 said: [echo: G043G189S36303G0Y182.3] ERROR: Error from machine: error: Unsupported command Aborted Job - safety first! WARN: Ignored: Port /dev/ttyACM0 said: [echo: G0X263.6Y182.3M5] WARN: Ignored: Port /dev/ttyACM0 said: [echo: M3S204] WARN: Ignored: Port /dev/ttyACM0 said: [echo: G1X196.0Y182.3F1920S204] WARN: Ignored: Port /dev/ttyACM0 said: [echo: M5] ^Cpeter@peter-H81MXV:~/GitHub/LaserWeb$ ^C
(so after the error, whatever is still in Node SerialPort's bufffer still gets sent, but my laserweb queue stops immediately - thats fine, its only a 64 byte buffer...
The 'problem' is I can reliably see that Grbl doesnt receive what was sent. Trying to look for a better way to handle serial now
No point posting any debug using old code please. As i said on hangouts, sit tight, i will let you know when to test again. Unskilled testing, sadly, doesn't help us move along. Its a distraction (;
Disable the feedback check: Put // in front of https://github.com/openhardwarecoza/LaserWeb/blob/master/server.js#L241, https://github.com/openhardwarecoza/LaserWeb/blob/master/server.js#L242 and https://github.com/openhardwarecoza/LaserWeb/blob/master/server.js#L243 Save and restart server.js
@openhardwarecoza , is there a reason that you are adding a newline after the "?" GRBL does not need it, right?
Have a test without? I cant recall why I added it...
I think the writes on Node SerialPort doesnt do a automatic CR so to signal end off line? (I.e. its same as pressing enter when manualy typing a command in minicom)
Posting this for my own reference (will play tomorrow) but looks like we may be able to get callback from the write: https://github.com/voodootikigod/node-serialport/issues/529
Havent had any luck tracking this down. Lasaurgrbl users runs a treat, smoothieware users run without issue over serial. Just Grbl being temperamental @chamnit
I can report the same problem using grbl 0.9j with a baudrate of 115200. You can uncomment #define REPORT_ECHO_LINE_RECEIVED in grbl/config.h make grbl echo each line it receives. Using this I could observer that various errors occured due to missing chars. Here are some example lines echoed by grbl:
MG1X28.350023Y102.000080F1400S255 G1X28.350023Y105.000082F140X2M5 G1X22.350018Y102.0052.SX10M3G.0SX3400055 G1X22.350018Y0520S00GY4M2
I saved the gcode generated by LaserWeb and uploaded it to grbl using GrblWeb which has been successful every time I tested it. GrblWeb uses version 1.4.x of serialport. Maybe you can trace the error inspecting their implementation.
Edit: using version 1.4.x didn't change anything. Edit2: Sorry for not reading the previous posts well enough before posting
As for the (now closed and duplicate) issue #113 i was experiencing the same issue with a ramps 1.4, TurnkeyTyranny patched Marlin firmware and LaserWeb running on a raspberry PI.
After some tests, i found that using pronterface as the host application to send gcode to the ramps/arduino board, the issue was somewhat mitigated, but the laser movements was not continouos ( seems to do some pauses ) exactly where it was doing bad things with LaserWeb as host application.
So, looking how pronterface works, it's happen that when this was happen, pronterface just re-send the non-working gcode line, and this was effectively (but partially ) mitigating the issue.
Then i investigated more deeply to come to the same reasons explained to me in the other issue by @openhardwarecoza , it seems that the arduino serial buffer wasn't large enough to manage to have empty space for receiving commands, and then the issue happen.
Then, i found a great workaround that is 100% working in my setup: just recompile the marlin firmware using in the arduino sdk a modified version of the MEGA 2560 board definition where the serial ring buffer is enlarged to 256 instead of 64.
To do that, i used the very same procedure explained here: http://www.hobbytronics.co.uk/arduino-serial-buffer-size just modifying the Mega 2560 definition instead of the Arduino UNO one.
Hope this can help anyone else.
Thanks thats awesome news! Keep us posted on testing if anything crops up later. On 24 Feb 2016 17:19, "Franco (nextime) Lanza" notifications@github.com wrote:
As for the (now closed and duplicate) issue #113 https://github.com/openhardwarecoza/LaserWeb/issues/113 i was experiencing the same issue with a ramps 1.4, TurnkeyTyranny patched Marlin firmware and LaserWeb running on a raspberry PI.
After some tests, i found that using pronterface as the host application to send gcode to the ramps/arduino board, the issue was somewhat mitigated, but the laser movements was not continouos ( seems to do some pauses ) exactly where it was doing bad things with LaserWeb as host application.
So, looking how pronterface works, it's happen that when this was happen, pronterface just re-send the non-working gcode line, and this was effectively (but partially ) mitigating the issue.
Then i investigated more deeply to come to the same reasons explained to me in the other issue by @openhardwarecoza https://github.com/openhardwarecoza , it seems that the arduino serial buffer wasn't large enough to manage to have empty space for receiving commands, and then the issue happen.
Then, i found a great workaround that is 100% working in my setup: just recompile the marlin firmware using in the arduino sdk a modified version of the MEGA 2560 board definition where the serial ring buffer is enlarged to 256 instead of 64.
To do that, i used the very same procedure explained here: http://www.hobbytronics.co.uk/arduino-serial-buffer-size just modifying the Mega 2560 definition instead of the Arduino UNO one.
Hope this can help anyone else.
— Reply to this email directly or view it on GitHub https://github.com/openhardwarecoza/LaserWeb/issues/94#issuecomment-188303108 .
I tried to adapt your workaround to grbl by changing the size of the serial buffer in grbl/config.h. Unfortunately this did not change anything.
There is another thing that leave me puzzled.
I've done a try by drawing just a single pixel line with gimp and saving it, then i rasterize it in laserweb, translated it a little bit and saved the generated gcode.
I was expecting to see a very little gcode, like one G0 move to where the laser have to start cut, then one single G1 for the cut and then eventually another single G0 to move back to home, with eventually 3 or 4 more lines before and after the real job, so, a total of max 10 lines of gcode file.
laserweb, instead, have produced a big gcode file of 414 lines, just 2 G1 lines and 401 G0.
G0 this way seems to be wasteful to me, it just move without firing laser on to the point where it need to start to fire, in right line, but it does it by moving on little 0.3mm steps: why that? This of course will flood the controller serial buffer for non needed commands.
It's me that i'm not seeing something or what?
@bulburDE i don't know grbl except for the pieces of it in Marlin, but if you are building the grbl firmware by using the arduino sdk, as i can understand by a fast reading on the grbl github wiki about how to build it, it's using arduino UNO as base, so, probably the ring buffer you are changing by setting grbl/config.h isn't the hw one, but just a separate ring buffer in the grbl layer over arduino core, so, you are not really changing how the arduino (derived?) board is setting the low level serial buffer, and you really need to do the same i've done but on the definition of arduino UNO exactly like in the link i've posted about how to modify it.
@nextime : That's incorrect. Grbl does not have any Arduino code in it. It works around how the Arduino IDE works so you can compile and upload Grbl without the additional bloat.
So, when you change the serial RX buffer size, it does so within the serial RX ISR itself.
@chamnit ok then sorry for that, i don't really know grbl ( never used it ), so, mine was only a bet
Franco
Rasterizing will convert every pixel it sees so it's not a single movement. For that you should try DXF or SVG, that should resemble more to a shorter gcode
Ariel Yahni
On Feb 24, 2016, 13:11, at 13:11, "Franco (nextime) Lanza" notifications@github.com wrote:
There is another thing that leave me puzzled.
I've done a try by drawing just a single pixel line with gimp and saving it, then i rasterize it in laserweb and saved the generated gcode.
I was expecting to see a very little gcode, like one G0 move to where the laser have to start cut, then one single G1 for the cut and then eventually another single G0 to move back to home, with eventually 3 or 4 more lines before and after the real job, so, a total of max 10 lines of gcode file.
laserweb, instead, have produced a big gcode file of 414 lines, just 2 G1 lines and 401 G0.
G0 this way seems to be wasteful to me, it just move without firing laser on to the point where it need to start to fire, in right line, but it does it by moving on little 0.3mm steps: why that? This of course will flood the controller serial buffer for non needed commands.
It's me that i'm not seeing something or what?
Reply to this email directly or view it on GitHub: https://github.com/openhardwarecoza/LaserWeb/issues/94#issuecomment-188454380
@cojarbi yes, i understand that ( the conversion of every pixel ), but it shouldn't be too hard to just remove all G0 except the last one when there are more than one G0 in series, isn't it?
Also you need to account for the pixel size conversion, meaning a single line you draw could be translated to multiple lines
Ariel Yahni
On Feb 24, 2016, 14:45, at 14:45, "Franco (nextime) Lanza" notifications@github.com wrote:
@cojarbi yes, i understand that ( the conversion of every pixel ), but it shouldn't be too hard to just remove all G0 except the last one when there are more than one G0 in series, isn't it?
Reply to this email directly or view it on GitHub: https://github.com/openhardwarecoza/LaserWeb/issues/94#issuecomment-188493916
ok, looking at the code G0 movements are already suppressed, but at every line it just add a G0 Y movement, so, it's not for every "white" pixel, but it's one for line, so, not all that flood
Sorry for all that discussion about it, i'm just reading the code and try to understand all what it does searching for possible optimization paths
Keep on going.
Ariel Yahni
On Feb 24, 2016, 16:45, at 16:45, "Franco (nextime) Lanza" notifications@github.com wrote:
ok, looking at the code G0 movements are already suppressed, but at every line it just add a G0 Y movement, so, it's not for every "white" pixel, but it's one for line, so, not all that flood
Sorry for all that discussion about it, i'm just reading the code and try to understand all what it does searching for possible optimization paths
Reply to this email directly or view it on GitHub: https://github.com/openhardwarecoza/LaserWeb/issues/94#issuecomment-188537232
@athulsnair reported that his grbl 0.9j does the unknown gcode random moves too (first replication in the wild after Chirag reported it to @nathanielstenzel and also after we did do some initial fixes here: https://github.com/openhardwarecoza/LaserWeb/commit/8dd53f3a2d35370f24076fa9d31332dcbdfc624f
Debug instructions to follow: