Closed amigoloko closed 8 years ago
I have found USB printing to be so flaky with my Mega2560-based RAMPS in general that I will only print from SD now. But, I suspect both a weak USB signal and too much electrical noise on the connection. Still, it would be nice to be able to recover cleanly from a lost connection, when that is the case.
@amigoloko - I faced such a problem - my control board is under the heated table and it overheated. fire protection, and it is deactivated.I add a little heat insulator - the problem disappeared.
@thinkyhead - Do you have a servo? because it refused to print my board periodically.
@lcfm1 - The Arduino/Ramps are on a completely isolated box away from any heat, even though the ramps has 2 12v fans working all time.
@thinkyhead Would you suggest the electrial noise is generated by...¿? a better quality usb cable would help? shorter one?
Update - i had found that when windows power settings are not set to disable the USB power shutdown this usually happens more often. moreover when the usb are not turned off by windows power setting it still happen.
If you take a look in the log and activate ACK you can see the last command. Under this command there should be an 'ok'. If not this #1922 should help.
@Wurstnase ok let me check this...
@thinkyhead… A short USB cable certainly helps, but what you really want is some noise filtered cable. For more information, have a look there: http://en.wikipedia.org/wiki/Ferrite_bead I have replaced my 75 cm unfiltered cable because of transmission errors against a 140 cm filtered USB cable and have no problem at all… Note both ends should have an EMI filter…
I have no problem at all using my SAV MkI using a native USB interface. It runs at 12Mbps peak transfers with a BER which is a joke. I use a 1.5m cable with a choke but also use it with a rubbish cable flawlessly.
One thing that would help is to look at the console and see how many retransmissions happen.
@fmalpartida i also had my doubts with the cable, since there are some printers that work as a charm, but then there others that its very constant that they encounter this issue. Nevertheless I will change cables on the ones (right now is one) that present this trouble.
Admittedly, it has been over a year since I tried USB printing, so it may be improved now.
I had almost the exact same problem, I observed the following when it happened:
I replaced the laptop I was using on my printer and since then have not seen this problem at all. I will reinstall windows on that laptop soon and retry it.
This issue can happen on bad USB connections. To solve this you can try different baud rates. If there is no 'ok' after the last command, the host will wait forever. That's the reason for my PR.
Repetier start with this: "Send "wait" when firmware is idle. Helps solving communication problems when host supports it."
@atunguyd that might be the power usb setting on windows have you checked that?
@amigoloko Are you referring to the sleep settings? I doubt it as I have disabled sleep on this laptop (cant have a laptop go to sleep during a print), when the problem occurs windows is very much running.
I also notice that when this occures even a emergency stop does not fix the problem (which I believe toggles the DTR line) also pointing to the USB interface as failed
@Wurstnase Looking forward to incorporating that fix. What's the consensus at this point – is everyone happy with your latest code? (My mind is only on bed leveling lately.)
At least for Repetier Host the "wait" will work. I made good experiences with that. My printer/pc/usb has also some issues and I have a missing ok in every print. If I haven't that part in my personal fork I could throw away most of my prints.
The line-number part I don't test. I can't test this actually because my sensor for Z is broken and I give my last one away. Bad timing :)
@Wurstnase Before we deploy the "wait" feature #1922 we should make sure it works ok with Cura Host (@daid) and Printrun (@kliment) too.
Where is these CuraHost? Do I need any extra add on?
I can add a temporary gcode for injecting a missing ok.
@Wurstnase If you have the Cura application that what I mean by "Cura Host." If you use Cura to do a print job, you can see how it deals with "wait". I'm not sure how you can test the fault that it's meant to fix.
I have Cura on my computer and I think some time ago i found the Host-app inside. But it's anyhow hidden? Or does this only appear when I sliced something?
If your printer is connected and you have sliced an object, the middle button in the 3D View is the "Print with USB" button.
Ah ok. When Cura/Pronterface/Octoprint get this feature, I will test this immediately. Both, 'wait' and 'ok linenumber' are optional parts.
Cura needs no wait feature.
See: https://github.com/daid/Cura/blob/SteamEngine/Cura/util/machineCom.py#L477 There I force a line send when the communication looks stalled.
Sure, the Host can handle this maybe. But this part is more printer-dependend and I think this should be modified in some way. In Cura it's hardcoded. Also 3 seconds is for my printer way too much. The print will get a lot of blobs.
Anyhow, the printer itself knows if it has nothing to do anymore.
In testing, the failed state from which that re-send recovers only happens once every 100 hours. If you get a lot of these errors you should look at the rest of your setup. As at certain S/N ratios it's simply not feasible.
Right. Many, if not most, of the people will never have issues. But there are some which have. I tried a lot, new USB-cable, different baud-rates, but it still happens every hour or so.
This is an optional feature and not everyone will need this. But someone will.
Are you sure you are not running into the other USB error?
My old laptop gets "device reports readiness to read but returned no data" exceptions on the USB Serial. Nothing I can do about it, whole communication just stops.
Note the title is wrong. You can't recover from a USB disconnect unless the host closes the port and reopens it because USB is a connection based protocol. Also USB has a link level retry mechanism so it should never lose data. The data should get there intact or the connection be lost, never missing data or corrupt data. If you get that you either have a driver bug on your PC or a hardware bug between the MCU and the USB chip.
With properly working hardware you should never see an error, even every 100 hours, because USB already detects and corrects them with CRCs, timeouts and retries. If that fails it should disconnect.
@nophead execpt, very occasionally, the serial data between the ATMega2560 and ATMega16U2 gets corrupted. (For an Arduino Mega that is. Some new boards use a single chip solution that never should have this issue)
Yes @nophead, I don't have a disconnect. In any reason the firmware doesn't send an 'ok' or the host doesn't receive one. This is the problem. In that case, the heater is still active, the complete printer is still active, only the host don't send any command because it waits for the ok.
Probably poor PCB layout or a bug in the ATMega16U2 firmware. With a Melzi and genuine FTDI chips I never see errors.
Probably poor PCB layout or a bug in the ATMega16U2 firmware. With a Melzi and genuine FTDI chips I never see errors.
Maybe, but it needs a solution.
I agree... I have a lot of users running into that, and just telling them "get proper hardware" is not going to cut it, no matter how true it may be. So anything we can do on the protocol side of things to recover from such issues helps the quite heterogeneous user base out there (and hence the people who try to support them).
@amigoloko #1922 has been merged, so if you get the latest code, try enabling the new option and see if it helps.
@thinkyhead i will, for the record. By now i have tried something on one machine, the one with continue stops: Brand new two cables, internal and external no cheap ones, hand made ferrite shield for the cable. Result. the continuous stops vanished. Now the not so often stops are still here, is very hard to predict them, and to catch them. i have noticed on Repetier Log, that sometimes, gets two commands (codes) at a time, then an OK. Does the OK should be immediately after each code?
Are there particular codes that don't say "ok" right away, or is there no pattern there?
All codes only reply when they are finished apart from G1, which replies when it is put into the planner queue. So any code that takes significant time and isn't a G1 will delay the OK. Also any codes that stack behind a slow one in the command queue will also reply late.
@nophead The situation could be improved on a code-by-code basis, to get them to send "ok" earlier. In fact, if it's only meant to be an acknowledgement of the command received, then we could just send "ok" at the top of process_commands()
instead of at the end. It's not a lie to tell the host "ok" and then to throw an error and not run the command received, right?
@thinkyhead "ok" means "command received and handled", as some commands also reply with extra data, and that data needs to be before or with the OK.
G1 and G0 should be seen as "queue move" in this aspect. Not as execute move.
Great! So in that case, again we now have ADVANCED_OK
that includes the sequence number, so there will be no more confusion, ever, ever again.
@thinkyhead, If you sent OK as soon as the command went into the queue the host would then send the next one and so on until the queue got full and then you would be into big delays again waiting for a slow command to complete and make a space in the queue.
@nophead Well the important thing is, somehow it works most of the time, in proper cyberpunk jalopy fashion.
If there are no comms errors, which there shouldn't be with USB (only disconnects if the hardware and drivers are correct) then it will work. Problems only surface when there are comms errors because it isn't a properly designed link level protocol.
@nophead Frankly, the communication protocol seems "good enough" at this point, in spite of various caveats. I notice that the Witbox and Hephestos configurations had added 1 to the BUFSIZE
(5 instead of 4) claiming it helped. Perhaps on boards with extra space there might be value in having bigger buffers, I can't say. Anyway, totally unrelated to error-recovery, I know. Buffers are going to block.
The thing still lingering on my brain about "buffers r gonna block" is that maybe an alternative protocol (or mode) would work better – one where Marlin must explicitly ask the host for the next N commands, and the host then only sends commands when Marlin asks. The ADVANCED_OK
basically does this by letting the host know how many new commands the firmware can handle, but it still leaves the choice open to the host…. Just brainstorming in circles here…
Well that is reversing the master slave roles, which would have a big impact on hosts.
Well the important thing is, somehow it works most of the time
No. The important thing is not to get stuff working in ideal conditions. What distinguishes good from bad protocols/software/hardware is not how it performs under lab conditions but how (and if) it handles problems and recovers from them. Which is why proper testing involves testing failure cases, not just expected good cases.
Statements like the above scare me like hell when coming from a maintainer of probably the most used firmware for 3d printers.
Also, what @nophead said.
Advantage of ADVANCE_OK
is that it is (mostly) backwards compatible. Reversing roles isn't.
Communication errors (with the Arduino Mega 2560) happen, I've seen it. So the checksum is important.
I'm using BUFSIZE 8 on the Ultimaker2. Helped with some internal issues, and people who did use USB printing with it, have reported little issues.
Statements like the above scare me like hell
It was not intended for your ears, particularly. In the moment it makes more sense. You had to be there.
Dear Marlin Community,
Im using the marlin firmware connected to repetier host, randomly and suddenly when printing the printer just stops printing, the repetier host gets stuck at a code (when sending them from the log). pause/stop/move the printer with a repetier host commands does not work.
I have to turn everything off, printer, close repetier and then restart from the beginning the print.
Have you run into something like this. i am using arduino mega 2560. One thing that i have on the arduino is to bypass the reset by cutting the stroke, and make a pin/jumper for uploading the firmware, so when ever you want to upload need the jumper no jumper cant upload. dont thinks this to matter but just in case.
from my point of view, it seems that the marlin is at loop of the buffer wainting for more commands. but hard to know for sure. and maybe the repetier/marlin lost connection. Another thought, might be the repetier host, even thou it does not crash it just seems to stop sending commands.
appreciate any help.