MarlinFirmware / Marlin

Marlin is an optimized firmware for RepRap 3D printers based on the Arduino platform. Many commercial 3D printers come with Marlin installed. Check with your vendor if you need source code for your specific machine.
https://marlinfw.org
GNU General Public License v3.0
16.28k stars 19.24k forks source link

busy protocol fails during heating #7131

Closed repetier closed 7 years ago

repetier commented 7 years ago

I'm currently searching while new Marlin can cause a timeout in my server and there I registered, that temperature commands that wait for temperature to be reached did not send any busy messages, also they take longer then the set 2 seconds. Testing G4 S10 showed busy, so it was explicitly enabled.

Since it must send busy for all long operations I count this as an error to the protocol.

I used the new autoreport temperatures on and off. It appears with M190 and M109 so both have the same problem.

repetier commented 7 years ago

Ok, after chatting for a week with the user and checking code this bug is in deed responsible for some serious problems with hosts relying on busy protocol. For you to understand the importance here is how it works:

  1. We do not assume it is implemented, so we wait for slow commands like M190/M109 not creating a timeout. This can lead to permanent blocks if [one] of the commands around [it] fails it will hang for ever, one of the big advantages of busy protocol.
  2. We see a busy: response - now we assume it is present and we get a busy for all slow commands - we disable slow command detection. Short timeouts are possible and normally set by user.
  3. M190 comes and takes more then 3 seconds. Since we have seen busy is enabled we do not wait forever, just 3 seconds or reset on busy.
  4. No busy, no ok - timeout sending next command. 3-4 are repeated sever times so printer has missed many commands getting server in trouble when it detects the missing lines sind from all these timeouts.

So PLEASE fix this bug quickly.

thinkyhead commented 7 years ago

we disable slow command detection.

That may be based on an incorrect assumption.

As I poke around, it looks like we did apply the NOT_BUSY in M109/M190 intentionally. M109/M190 report the temperature while waiting, and this can be relied upon by hosts to indicate that the machine is alive. It won't block an "ok" response from a previous command.

Why can the temperature reports no longer act as the "keepalive" signal during M109/M190 — even with the keepalive capability active? Can Repetier not detect that the machine is in a heating loop?

That was the original reason that we implemented a host keepalive — to tell the host that the machine was still alive and well, and not just be silent. It was the silence that was causing host timeout, and host keepalive was only designed to fix that issue.

repetier commented 7 years ago

Temperature reports never were meant as keep alive signal. Well host would see firmware is still connected, but that does not say it is busy so the host should not timeout when no ok is received. After all you also now have a autoreport temperature. That would mean not mean G4 S20 will not block for 20 seconds. In fact G4 will still send correctly busy. Timeout is the fact that host has send a command and can not send new commands as one of the buffered commands takes longer. In that case busy says "you did not miss an ok, it is really taking so long". Temperature report definitely does not contain this information as it comes when sending is possible and when it is not possible and even was one of the reasons you introduced busy as I understood it.

repetier commented 7 years ago

Maybe another way to see to make it 100% clear.

The main problem of hosts is when do I send the next command. In the 100% perfect world we send a command and wait for "ok" and send next ok. In reality the "ok" sometimes comes "o" or "k" or something else and host would never see it and never send next command. So there need to be some tricks that hosts can fix this block. So what they have is a timer started at last command and if that is exceeded they run timeout action, essentially assuming they missed an "ok" so they send the next command. To prevent this, one of the following must happen:

  1. "ok" received.
  2. "busy" resets timer so a command can block indefinitely.
  3. "wait" received telling firmware has no open commands..

All other responses would tell the firmware is still alive so saying temperature report is a keep alive signal is correct. But these responses do not mean the "ok" will be delayed, no need to timeout. Best example would be autoreport temperatures. That would prevent any timeout then for all commands so a missed "ok" would always block forever. So it is not usable as a timeout reset marker.

Roxy-3D commented 7 years ago

Perhaps it is time to put a real protocol in place? I am extremely uncomfortable with the weak 8-bit checksum on the gcode lines. That is not nearly strong enough.

And it would seem to me the host should be able to ask exactly what is going on. Specifically... what command the printer is currently processing and why the command is not done yet. It would be nice for the printer to respond with an estimation to what percent of the command is complete and how much longer it will be until another command starts being processed.

lrpirlet commented 7 years ago

@repetier @Roxy-3D @thinkyhead Please do define a protocol...

BTW, please make full use of the ascii code and also introduce the possibility to replace the "ok" string with a unique character as X-ON and that too long busy string with X-OFF...

ref: https://en.wikipedia.org/wiki/ASCII

lrpirlet commented 7 years ago

or the opposite, (I always get confused wtih xon xoff effect)

repetier commented 7 years ago

First I'd like the busy protocol to be fixed, so the existing firmware can be used on hosts detecting busy as this is broken. That should be priority and no endless discussion before that.

Regarding new protocol, there is not much need if you would enable all the additions like ok with line numbers, wait and busy. These together give already a good package and the capabilities protocol give the hosts chances to detect features, at least until someone changes the behaviour.

ok or any other character makes no real difference. If you have an error both are equally bad or good. What might be an improvement is 2 or 3 normally unused chars as replacement since normally only one char gets wrong, so ACK + BEL instead of ok. Line numbers still allow a detection of missed lines on the fly which is why we always send every line with them.

Checksum is in deed bad and was the reason I invented the repetier binary protocol for sending data. There I use fletcher checksum as used in TCP/IP which is very fast in computation and much harder to get false correct sum. We also made communication binary to reduce send data length to around 50% allowing to queue up more commands which increase speed. So for the sending direction this is a ideal solution that also allows text to contain any character, not like with the ascii where e.g. M117 can not contain * or for a long time : without breaking parsing.

One big problem that remains in both protocols is the answer. I have seen cases where responses got completely mixed with other content or with extra chars so a temperature of 123°C became 1230°C. Gets ugly spikes in temperature graph and others can also confuse hosts if they report back changed values. Since answer comes from a much weaker system we lived for now with it. Here a checksum might be a solution so hosts know if they should trust the answer or not. Something like resend is out of order I think. That would require firmware to store lines until host ack them and makes communication to slow.

lrpirlet commented 7 years ago

To use X-ON;X-OFF instead of "ok" is nice just because it is a ONE character... to tell the host I am am alive here is the state of my buffer...

So imagine how many less interrupts generated to send just X-ON telling the host "I have room in the buffer"... And I see no reason to stop sending that character every 2 seconds until the buffer is full again.

So imagine how many less interrupts to send X-OFF telling the host I am alive, but my buffer is full, instead of "I am busy doing something that takes longer than host defined timeout"... and I see absolutely no reason to stop sending that character every 2 seconds until the buffer is empty again.

So yes I propose having some sort of a non readable output if the host can handle it. I guess that repetier transfer mode is doing something similar(???)... Of course this is a deviation from the GCODE standard as well as sending anything else than 'ok' when command was accepted...

That said, since I have no time nor high skills to develop anything like a protocol or even to repair the reported problem, I'll stop polluting this thread...

repetier commented 7 years ago

Just found another place where you break the busy protocol (M303):

KEEPALIVE_STATE(NOT_BUSY); // don't send "busy: processing" messages during autotune output

thermalManager.PID_autotune(temp, e, c, u);

KEEPALIVE_STATE(IN_HANDLER);

Just the fact you you allow disabling it makes it completely unusable. Either you support it and it returns busy when it takes longer or you do not report busy at all. There is no if we feel like reporting we do and sometimes don't. Hosts can not know when it pleases your version and when you add a new command that also does not like to send busy. Please let me know if there are any plans to fix the handling or if I just should remove supporting it for Marlin.

I understand that writing "busy:" between temperature outputs is not so nice while reading code, but hosts should hide this anyway (we do if ACK is not activated) and it is to solve a existing problem of hosts needing long timeouts and a list of slow commands (which is never complete if you add new commands) to keep communication going if errors occur. And these solutions are not perfect and still can cause a dead lock in communication with bad error timing. That was the nice part that busy solved.

thinkyhead commented 7 years ago

Since M303 is not a host-initiated command to be used during printing, but only ever used by itself as part of tuning a machine, I'm hesitant to add a "busy" signal for that one. The output is in fact intended to be human-readable.

thinkyhead commented 7 years ago

7253 applies the requested workaround so hosts that strictly require "busy:..." to reset their "wait for ok" timer will be satisfied.

repetier commented 7 years ago

Thanks for making it better.

You wrote to revert back to previous behaviour for hosts, but temperatures were never a signal that a command takes time. As already described we automatically activate autoreport temperature, so a returned temperature has no meaning.

What we have is a list of slow commands but this gets disabled in favour of busy for better prediction. What I will add for next release is a marlin busy exception list, so these commands then get also no timeout. Not what I like but if people have a choice some will choose the wrong and then complain for timeouts. Can you say which commands beside M109/M190/M303 may be affected as well?

foosel commented 7 years ago

Can you say which commands beside M109/M190/M303 may be affected as well?

While OctoPrint doesn't have that particular problem with the current interpretation of the busy: protocol (simply because I not specifically implemented that protocol but only use the messages as an indicator of "printer still alive, keep calm and carry on"), I'd still also be interested in an exhaustive list of that as well, for reasons of general long running command tracking.

jbrazio commented 7 years ago

Actually I do believe the rabbit hole goes deeper.. by this I mean that the protocol we use to communicate with 3D printers is aged, and it didn't age well. GCODE is like XML, it was created as a standard way for two machines to communicate but to be readable by humans, which is nonsense. I can get it for CNC, were you have gray beards still cutting metal from the control box without using a computer, but for 3D printers.. who can build a 3D model from his head using only GCODE ?

But now we're doing it worse.. not only GCODE is verbose and inefficient as we're adding bunch of control ops completely out of the GCODE spec.. wtf is a "busy:" ?!

We need to gather around and build a standard, it's not far away, @repetier already did a huge step into that direction.

foosel commented 7 years ago

the protocol we use to communicate with 3D printers is aged

It isn't a protocol to begin with, it's a fileformat abused as a protocol with some stuff zip-tied on top of it. It's not well defined, it's a hack job.

We need to gather around and build a standard, it's not far away, @repetier already did a huge step into that direction.

The last time I tried that for the very same reason, I was attacked for it. I'll leave someone else to step forward this time.

jbrazio commented 7 years ago

I don't think this will be a "one person" job, we have here on this thread some of the most relevant people of the open source 3D printing scene, it's a question of really checking who is on-board and build a work group for it.

bobc commented 7 years ago

Creating a new protocol is the easy part, several have been created already. Creating a really good protocol is somewhat harder. The best thing there is not to re-invent the wheel. All the problems faced with 3D printer communication have been solved many times over in other protocols. The really hard part is getting people to follow a standard, some developers have a policy of deliberately ignoring standards.

It's been frustrating for me to watch, I implement comms protocols for a living, and don't have the luxury of ignoring standards when I want to :)

On 7 July 2017 at 10:15, João Brázio notifications@github.com wrote:

I don't think this will be a "one person" job, we have here on this thread some of the most relevant people of the open source 3D printing scene, it's a question of really checking who is on-board and build a work group for it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/MarlinFirmware/Marlin/issues/7131#issuecomment-313631114, or mute the thread https://github.com/notifications/unsubscribe-auth/AA7VR1G9PWEl0lg8k-pOdx8o8Ayf3lvBks5sLfdPgaJpZM4OESa9 .

jbrazio commented 7 years ago

I do not think the current status is adequate, just like @foosel said it is a hack job badly done with zip ties and cheap glue.

We have issues with data transmission, we need to switch to a binary protocol, saves processing time and serial bandwidth; We have issues with data corruption, we need a proper CRC and sequencer, i.e. be able to detect an error and correct for it by requesting the packet back; and the most prominent one, we need transmission flow control (this is the main topic on this thread), a proper adaptive transmission rate between the host and the device.

We do not need to reinvent the wheel, let's take the best in class for those requirements (TCP/IP) and mix it with @repetier binary protocol as the starting point.

If you have been doing machine communication for living you must roll your eyes at the current situation.

If we get a new protocol as an option within Marlin, Octoprint and Repetier we have enough share to promote a real beneficial change to the market without breaking the as-is.

davidelang commented 7 years ago

a binary protocol doesn't save much time and it adds significant complications to the sender and to debugging (there are far too many times when people need to debug communications with a serial console, don't break that)

CNC machines (which these are) communicate with simple ASCII text messages that you can type in a terminal if you need to

having a config option to enable XON/XOFF for rate limiting is a good option to have.

but changing the protocol so that all the software that is out that that can control these 3D printers has to be changed is a horrible idea.

I agree that the gcode interface to types of machines is less than ideal, but it is pretty standardized (the parts that are badly standardized are the list of what gcodes are supported, not the communications protocols)

jbrazio commented 7 years ago

How can someone say out loud we have a standard ? Have you had a look at the Reprap GCode bible, did you noticed they have for the same G code multiple meaning for each parameter based on the firmware ? And it's not only a question parameters, you can even find G code that have different behavior for each firmware. This is not a standard.

Don't confuse debug with machine to machine communication, they both share the same physical layer and that's it.

You insist on hw flow control, we could go that way but it's not enough to solve the bigger picture.

bobc commented 7 years ago

It's quite easy to retain a "legacy" text protocol while still having a binary mode available, which I believe is what Repetier does. Therefore, this need not be an objection.

XON/XOFF is an "all or nothing" approach which has limited usefulness. For example, we would still want to allow the host to query status etc, even if the firmware is unable to accept more motion commands.

davidelang commented 7 years ago

excuse the soapbox here for a minute (and stating the issue as I understand it)

the configuration settings are unique to each machine/firmware type, but the movement commands (the things that people need to have standardized to be able to print things) are pretty stable.

for the vast majority of CNC machines, the machine to machine communication is newline terminated ASCII text with a newline terminated 'Ok' as a response. you can connect $serial_program_of_choice and type/cut-n-paste lines of gcode and have the machine move.

That is simple and works well.

Where things get interesting is when you start sending multiple lines before the machine has finished processing the earlier ones. The sender needs to avoid overflowing the buffer on the receiver.

In an ideal world, we would have hardware flow control with RTS/CTS and the problem would be solved. But we don't have that. We also don't have a real serial interface, instead we have a USB serial simulation, which doesn't send the bytes one at a type, but instead bundles them into a packet and sends a bunch of them at once.

Xon/Xoff may be able to work if the USB serial simulator can support it, otherwise 'interesting things' can happen as the data is buffered in both directions before it's sent if you are too close to the edge of the buffer before sending the Xoff.

Keeping track of the buffer contents on the sending side and only sending a line if there is room in the buffer (based on keeping track of what you've sent and what's seen an ok message), works, but is fragile (and complicated by there being two buffers in most software, the library buffer, and then the software buffer that the data gets copied from)

This issue is because the sender is looking for the Ok, and it doesn't happen before it has a timeout.

@bobc, if the serial buffer is full, you can't send query messages or new lines of gcode, the processor can't do anything.

the time spend processing the ascii text is so minor compared to everything else that is happening that switching to a binary protocol will not make a noticable difference in CPU utilization. It's possible that it would make a noticable difference in message length (and therefor buffer utilization), but you may end up using more space to send a fixed-precision binary number than you would a variable-length text number. There's a lot of space saved in practice by being able to leave out duplicate information, and binary data may not save nearly as much as you hope.

you need to be able to specify positions down to 0.001 or ideally 0.0001 (if you are in inch mode) and up to several thousand or tens of thousands (if you are in mm mode)

you need to be able to leave out any/all of the parameters, so you will need a bunch of flags to indicate what parameters you have in a message

design your protocol and work out exactly how much it will save, do some tests of the CPU and buffer space needed.

Roxy-3D commented 7 years ago

The last time I tried that for the very same reason, I was attacked for it. I'll leave someone else to step forward this time.

Joao and I certainly did not attack the idea. And some times, the people that can see the path forward are a little too early to get the required support. If for no other reason than the reason for change isn't clearly visible by enough people. The time still may not be 'right'. But people can now clearly see the deficiencies in the current communication standard.

The really hard part is getting people to follow a standard, some developers have a policy of deliberately ignoring standards. It's been frustrating for me to watch, I implement comms protocols for a living, and don't have the luxury of ignoring standards when I want to :)

Obviously a shot at me. And you know what??? I think the GCode standard SUCKS. It is too limiting for people that want to implement complex algorithms. And it sucks for users when they have to remember some magic G number for a specific command they need and some special 1 letter parameter to do the flavor of the command they want.

I know my answer will irritate the GCode Nazis. But that is how standards evolve. If the standard is too lame to meet its intended purpose, we NEED people to violate the standard to force it to evolve.

you noticed they have for the same G code multiple meaning for each parameter based on the firmware ? And it's not only a question parameters, you can even find G code that have different behavior for each firmware. This is not a standard.

Agreed. GCode sucks.

Don't confuse debug with machine to machine communication, they both share the same physical layer and that's it.

Agreed.

It's quite easy to retain a "legacy" text protocol while still having a binary mode available, which I believe is what Repetier does. Therefore, this need not be an objection.

Agreed.

jbrazio commented 7 years ago

I would not be so keen to promote a radical departure from GCode if we're somehow still sticking to its basic syntax principle, but more and more we need more controls; these controls are being implemented with no consensus (i.e. clear behavior) between host and firmware devs and outside of the bounds what we should expect from GCode.

I know you see the flow control as a fix for a specific issue and I do believe you fully understand what causes the issue and that flow control is a perfect solution for it but It's our job look a bit further ahead and weight all the options.

In fact this is not really new.

..but software flow control using in-band signaling is one more nail into gcode's coffin.

repetier commented 7 years ago

Please just forget control with serial extra lines. Arduinos have exactly 2 lines for serial Tx and Rx and nothing more. They even use DTR and RTS for resetting the board on some connections. Others have no serial connection at all, e.g. native port on due or use TCP/IP. So any serial specific signal is not suited for communication.

Regarding precision is binary the best solution as it submits all bits in a float while having smallest size in average. In ascii you can send 883.34533445 but back conversion will [lose] the last numbers as they do not fit in float anyway. In repetier-firmware we have a bitfield for used parameter followed by values for set bits and fletcher checksum. First byte has bit 8 set to indicate that this command is binary. So firmware can accept ascii and binary for same connection. But the protocol will not work with Marlin as it has the restriction of position-independent parameter and having parameters only once. Marlin has some commands where parameters are used several times and meaning depends on position, so these could not be converted at least to that protocol, meaning you need something with more overhead:-(

bobc commented 7 years ago

A while ago I wrote a spec for a Packed Binary GCode which was designed to be flexible but still compact. Any GCode line should be able to be encoded and decoded to exactly the original. It trades simplicity and generic use for slightly lower compression. I implemented the protocol in some experimental ARM firmware. I also wrote a note on how BUSY could be handled, Busy Protocol Both of those were written some time ago, they will need updating to reflect current implementations. e.g to extend the number of command codes to allow lower case. @ambrop7 also implemented a different idea, and apparently Ultimaker also did something which I haven't looked at. Packed gcode working with APrinter firmware

davidelang commented 7 years ago

On Sat, 8 Jul 2017, repetier wrote:

Regarding precision is binary the best solution as it submits all bits in a float while having smallest size in average. In ascii you can send 883.34533445 but back conversion will [lose] the last numbers as they do not fit in float anyway.

floats only make sense if you never care about large values and high precision at the same time. With CNC machines you don't want to [lose] precision just because you are a long way away from the origin.

There's also the problem that you can't represent 0.1 in binary, you have to approximate it.

In repetier-firmware we have a bitfield for used parameter followed by values for set bits and fletcher checksum. First byte has bit 8 set to indicate that this command is binary. So firmware can accept ascii and binary for same connection. But the protocol will not work with Marlin as it has the restriction of position independent parameter and having parameters only once. Marlin has some commands where parameters are used several times and meaning depends on position, so these could not be converted at least to that protocol, meaning you need something with more overhead:-(

Yep, these are the sorts of problems you have to solve to switch to a binary protocol, and once you have done so, is the result really any better than an ascii protocol?

thinkyhead commented 7 years ago

The last time I tried that for the very same reason, I was attacked for it.

So was I! In fact, I got "carpet bombed" first, but purely due to misunderstanding. I wasn't criticized for trying to mess with any protocols or the way G-code worked, but merely for attempting to include a G-code section in the Marlin docs. Someone thought I was trying to define rather than just document. Fun times.

By the way, I finally did "finish" that documentation, almost single-handedly. Whew! Some implementation notes may still be added, and some commands (G33, G26) will get updates once their implementations are more settled.

Anyway, at that time the point was to clean up the code and start making proper documentation, because Marlin was a mess with very poor docs. That was a two-year job, as it turned out! Now that we're about to start a 2.0.0 branch with a new layout and 32-bit support, this would be a good time to revisit alternative protocols!

is the result really any better than an ascii protocol?

Not necessarily. The number "10" is smaller in ASCII than as a binary-represented float. Check out #7047 which proposes adding a "chunk" option, bypassing the G-code parser. An interesting approach that augments G-code without replacing it.

thinkyhead commented 7 years ago

@bobc I have looked at your binary protocol before, but I wasn't ready to make an implementation. I'd be very interested to see your ARM version! Also, what do you think of ReplicatorG protocol?

As for the BUSY protocol, that is a different beast from the Keepalive Protocol, and perhaps that is part of the confusion. The point of "keepalive" —as it was discussed before it was built— was to simply not be silent during long operations. @foosel took it in the correct spirit for OctoPrint, but it sounds like what @repetier really wanted was your BUSY protocol.

thinkyhead commented 7 years ago

Marlin has some commands where parameters are used several times and meaning depends on position

Not any more, if it ever did. With the updated G-code parser, Marlin only sees the last value given for a parameter (or, without FASTER_GCODE_PARSER, the first).

Marlin will parse a command like G777 B5 C9 C8 C2 as G777 B5 C2.

thinkyhead commented 7 years ago

The sender needs to avoid overflowing the buffer on the receiver.

@davidelang What do you think of the ADVANCED_OK option? Does that help mitigate the issue well enough? It might give the host an opportunity to wait to send commands until the buffer has space, with advance notice, but then again maybe not.

thinkyhead commented 7 years ago

In ascii you can send 883.34533445 but back conversion will [lose] the last numbers

@repetier This is true. From the slicer side, I wish they would use fewer significant digits, because beyond 0.01mm they're just wasting space.

thinkyhead commented 7 years ago

@repetier: Can you say which commands beside M109/M190/M303 may be affected as well?

g-code blocking
G0, G1 When the buffer is full.
G28 Homing a long Z axis may take several seconds.
G29 May take several minutes (in auto mode).
M0, M1 Block indefinitely
M48 May take minutes.
M109, M190 May take several minutes.
M226 Block indefinitely
M300 Block for the duration (if not using background queue).
M303 May take several minutes
M400 May block for several seconds
M600 Block indefinitely
M862, M863 May take several seconds
M360, M361, M362, M363, M364 Block like G0 and G1
davidelang commented 7 years ago

On Mon, 10 Jul 2017, Scott Lahteine wrote:

In ascii you can send 883.34533445 but back conversion will [lose] the last numbers

@repetier This is true. From the slicer side, I wish they would use fewer significant digits, because beyond 0.01mm they're just wasting space.

The slicer doesn't really know what resolution you are working with. For the maslow project we are just adding a feature to the PC side sending software that lets you specify how many digits to trim to (different number of digits past the decimal make sense depending on what units you are using and your machine capabilities)

CONSULitAS commented 7 years ago

Now that we're about to start a new 2.0.0 branch with a new layout and 32-bit support, this would be a good time to revisit alternative protocols!

Long time ago i had a suggestion for a simple but effective protocol. But i had similar experiences. I discussed it with @foosel and she seemed to like it.

If we really want to develop a new protocol, it has to be done together with @foosel for Octoprint und @repetier for Repetier-Host.

BTW: The proposal i made (i call it C-code like in Control) has been quite simple:

Simple C-code protocol

From host to machine

From machine to host

Advantages

So, what is missing? I think you can simply add it.

davidelang commented 7 years ago

On Mon, 10 Jul 2017, Scott Lahteine wrote:

The sender needs to avoid overflowing the buffer on the receiver.

@davidelang What do you think of the ADVANCED_OK option? Does that help mitigate the issue well enough? It gives the host an opportunity to wait to send commands until the buffer has space, with advance notice.

I am not familiar with this option (I'm new here, I just jumped in because this was an issue we've just been fighting with the maslow project)

On Maslow, the problem we ran into was that even if we waited for an ok before sending data, we could still run into problems because the sender was calculating if there was enough space on the receiver based on a 256 byte buffer, but the serial library has a 64 byte buffer that the firmware copied from into the 256 byte buffer, and it was possible to overflow the 64 byte buffer if you had a very fast PC and the firmware hit something else that kept it busy for too long.

We still haven't fully solved this on Maslow, we've implemented a temporary short delay after sending for now.

The maslow is a 4'x8' CNC, so the numbers get rather large when you are dealing with mm, but you still need the precision.

David Lang

davidelang commented 7 years ago

even G0 and G1 can take a long while if they are moving a long distance. I know this isn't common in 3d printing, but there are some large build volume setups out there.

davidelang commented 7 years ago

why could you not just eliminate the C1 and include G-code directly (G-code already has the option of Nxxx line numbering)

once you are outside of the G-code namespace (G, M, N and a few others), you can use whatever prefixes you want. C is no better or worse than $*

Having an option to ack or report errors for specific messages is a nice touch. I suspexct most software will accept *ok\n without any problems, so you are probably ok sending anything else as part of the ok/error/busy messages

what is the advantage of this variation over the existing protocol?

It's not any shorter (in fact, it's longer), it uses C for commands instead of $, B* that I've seen on other CNC machines, that's a change, but the particular prefix shoudln't matter.

There may be other advantages I'm not seeing, but you need to spell them out to folks rather than just saying 'how about this'

davidelang commented 7 years ago

On Mon, 10 Jul 2017, Scott Lahteine wrote:

is the result really any better than an ascii protocol?

Not necessarily. The number "10" is smaller in ASCII than as a binary-represented float. Check out #7047 which proposes adding a "chunk" option, bypassing the G-code parser. An interesting approach!

This is exactly the type of thing that I'm trying to call attention to. I'll take a look at the chunk discussion.

Roxy-3D commented 7 years ago

Check sum

Checksums are too weak. We need at least CCITT CRC-16. And it is already coded up and being used in Configuration_store.cpp to guard against EEPROM corruption.

colinrgodsey commented 7 years ago

@foosel @thinkyhead ah, i knew there had to be a thread going around like this somewhere ;)

So, my approach to this was that g-code itself is still a fine data format. Human readable, editable, commands are easy to look up and are somewhat standardish. My proposal (#7047) really tries to leave that as it is, and basically just augments that protocol for a form of external g-code "preprocessing" that uses binary protocols for the hotspots (G0/1) and also happens to bypass the planner. It takes the majority of g-code out of transmission, but still leaves full functionality as a fallback, and as a control protocol.

When doing the above work, i was trying to think of something like the HTTP protocol, which is just ASCII and new lines, but just for the headers and the "small" stuff. If your ASCII headers are followed by 1.2GB of binary data, great! You're not overburdened by ASCII where it counts, but you can go full ascii just fine (eh, could extend this analogy further with something about SVG, but I won't.... It's a bulky ASCII graphics format.).

The thing I kept going back and forth on was where the two protocols could really meet. I considered doing a g-code (ironically, i was going to use C1) that would signal some type of binary buffer transfer, but there honestly was very little room to spare for pushing large binary buffers through the g-code parser. So I opted for a raw control character ('!' in this case) that forces the binary data processing entirely in the ISR, avoiding the g-code pipeline entirely, and returning an ID for the buffer. Think of it like a lazy persons DMA. Then when it comes time to "trigger" the chunk buffers, that's done through just a normal g-code (C0 in this case) with the buffer index. This allowed me to entirely decouple the protocol streams- the binary parsing does not rely on the g-code parsing, they have different buffers, they can respond to requests in a multiplexed way. I initially was thinking of doing this with more hardware (like shared RAM)... but that's a sucky option.

As far as raw g-code as a transmission protocol... I think there's lots that can be fixed. The checksum is definitely weak, the response data is decoupled from the request (we have N# for numbering requests, but some of them respond with stuff outside of the ok N# ... format), there's no real back-pressure support, and there's no general keep-alive protocol. I don't think this is cause enough to ditch the idea of g-code as a transmission protocol though, just maybe needs a standard extension for transmission that addresses these issues.

Uhm... beyond this I don't really know. Protocols suck, especially if you want to support durability and multiplexing (or buffering) while maintaining linearization (linear pipeline) and offering "exactly once" durability.

repetier commented 7 years ago

@thinkyhead

g-code blocking
G0, G1 When the buffer is full.
G28 Homing a long Z axis may take several seconds.
G29 May take several minutes (in auto mode).
M0, M1 Block indefinitely
M48 May take minutes.
M109, M190 May take several minutes.
M226 Block indefinitely
M300 Block for the duration (if not using background queue).
M303 May take several minutes
M400 May block for several seconds
M600 Block indefinitely
M862, M863 May take several seconds
M360, M361, M362, M363, M364 Block like G0 and G1

You really mean all these commands do not send busy when they take longer? So beside G4 any commands where it happens at all? It would make sense on all of them. Aim of busy was to reduce timeout so a hang can be resolved fast. But if even the main print commands G0/G1 do not send busy I would ignore it.

Regarding double used chars I have especially G29 in mind using N which also gets send as line number first.

Regarding new protocol you will always need to come down to original gcode as this is what users anter and slicers create. Hosts should always log it ascii so users can read what is going on. For repetier binary we send it binary but log the ascii version. So if we speak about protocol we speak more more about the hull and not the gcode itself. Also we have already a solution for nearly all communication problems: "wait" - tells server all commands are done M115 with cap protocol - Tell hosts what we have compiled in e.g. eeprom. autoreport temps,.. M360 - Report configuration (in repetier, not sure about marlin) "ok Nline" - allows hosts to detect missed ok for those sending several lines at once to speed up communication. "busy" - Tell host to not time out and send next command if command takes longer also broken under some conditions. "rs line" when checksum fails.

If you need better checksum add a capability so host can detect it. Then make either a switch so *checksum is then new checksum system or add a checksum char normally not used (M117 is here always a problem if the char gets used there).

No need to invent new sending protocol whcih just makes more work for everyone.

Binary is just to make it more compact in transmission. If that does not happen it has no real advantage.

And never forget to be backward compatible and use cap protocol to tell hosts about it. The landscape is so torn regaring forks that hosts can not make any assumptions that divide from what was 3-4 years ago allowed if not explicitly told or having a way to detect it. And most important - never change how these new features work after invention for same reason. If it has several meanings hosts just need to remove it to keep compatible then and that does not help anyone.

foosel commented 7 years ago

I currently lack the time to read through all comments, sorry in advance to everyone who mentioned me (I'll try to get around to it ASAP) but:

And never forget to be backward compatible and use cap protocol to tell hosts about it. The landscape is so torn regaring forks that hosts can not make any assumptions that divide from what was 3-4 years ago allowed if not explicitly told or having a way to detect it. And most important - never change how these new features work after invention for same reason. If it has several meanings hosts just need to remove it to keep compatible then and that does not help anyone.

^ This is indeed very important and I can't quote it enough.

thinkyhead commented 7 years ago

You really mean all these commands do not send busy when they take longer?

No, not what I mean. All commands send 'busy' when they are in process (if host keepalive is enabled). The commands I listed are those that may block for a period.

And never forget to be backward compatible

Correct.

No need to invent new sending protocol whcih just makes more work for everyone. Binary is just to make it more compact in transmission. If that does not happen it has no real advantage.

Agreed.

M360 - Report configuration (in repetier, not sure about marlin)

M503 in all versions of Marlin.

If you need better checksum add a capability so host can detect it.

If the checksum method changed, likely we'd want to change the protocol to something like…

  #define PROTOCOL_VERSION "1.1"

The landscape is so torn regarding forks

Unless a fork is popular, or installed on all of some vendor's machines, then defer always to the main fork. If something has become popular in some other fork, then let us know and we will adopt it here.

thinkyhead commented 7 years ago

Regarding double used chars I have especially G29 in mind using N which also gets send as line number first.

I put the kibosh on (i.e., forbade) the use of 'N' in G29 or any other codes.

thinkyhead commented 7 years ago

As far as raw g-code as a transmission protocol... I think there's lots that can be fixed. The checksum is definitely weak

It's not as weak as you'd think. Run a simulation in which you randomly drop or glitch bytes in a G-code string, and see how many of them end up with the same checksum as the original G-code. I think you'll find it happens so rarely as to be vanishing.

colinrgodsey commented 7 years ago

It's not as weak as you'd think. Run a simulation in which you randomly drop or glitch bytes in a G-code string, and see how many of them end up with the same checksum as the original G-code. I think you'll find it happens so rarely as to be vanishing.

That's true, there's usually a good amount of entropy in the commands. And errors should be extremely rare. But, if you're using a mismatched rate, the device could potentially get errors up to 8%. Eh... there's probably some math here, or at least statistics, that could solve this. I think for right now, it just "feels" weak to me at least. But if you have the right baud rate, your errors will already be so extremely low it really doesnt matter.

davidelang commented 7 years ago

It's very unlikely that just a few bits will get flipped, what is far more likely to happen is a buffer overrun where the line gets truncated (and therefor merged with the next line), a simple checksum is pleanty good to detect those sorts of problems

davidelang commented 7 years ago

If you're at the wrong baud rate, it's unlikly that you will get enough data decoded to make any sense at all.

you are thinking of checksums in terms of security things where you want to make sure that two documents don't have the same checksum to defend against a malicious change of the data. In this case, we have short messages, limited to ascii text, so you are extremely unlikly to find two messages that both decode to valid messages that have the same simple checksum.

The purpose of the checksum is as much to detect truncated messages as anything else.