TomNisbet / TommyPROM

Simple Arduino-based EEPROM programmer
https://tomnisbet.github.io/TommyPROM/
143 stars 29 forks source link

XModem checksum failing with TeraTerm #20

Closed TomNisbet closed 3 years ago

TomNisbet commented 3 years ago

The fix for #19 has broken TeraTerm support. It seems that TeraTerm does not like to send files when TommyPROM asks for checksum mode instead of CRC mode. The TeraTerm dialog shows that it is using checksum mode, but the checksum calculation on TommyPROM fails. Will need more investigation to determine the failure point.

In the meantime, the default file transfer prototol is now back to XModem-CRC for TeraTerm compatibility. Linux users will need to comment out the CRC #define at the top of XModem.cpp to use checksum mode instead.

Discussion here: https://www.reddit.com/r/beneater/comments/jsf1hk/tommyprom_xmodem_sendwrite_problem/

NigelTwo commented 3 years ago

@TomNisbet. Is it possible that the reason XMODEM/CRC is supported in TeraTerm, is that it was used by Cisco routers as the "bare metal" recovery method. I personally have never encountered the protocol anywhere else since the dawn of the 21st century.

I was reading through "XMODEM/YMODEM PROTOCOL REFERENCE, A compendium of documents describing the XMODEM and YMODEM File Transfer Protocols. Chuck Forsberg 10/14/88" (that's 1988). It seems that the XMODEM protocol quickly became fractured as everyone tried to fix the problems in the original. I think Chuck wanted to move onto a newer protocol quickly. So XMODEM/CRC was probably as "good as it got" for a half duplex data transfer protocol that worked over acoustically coupled 300bps modems.

I would agree with your decision to revert back to using XMODEM/CRC as the default transfer protocol. The Linux programs rz/sz accommodate it, in fact they are probably the gold standard!

TomNisbet commented 3 years ago

After building some homemade serial monitors with an Arduino Mega I think I've finally gotten to the bottom of this. In checksum mode, TommyPROM sends one or more NAK characters to start the transfer. TeraTerm seems to buffer one of these NAK characters and use it as an indication that the first packet should be sent again. This resend happens while TommyPROM is busy writing to the EEPROM and that causes the Arduino's 64 byte buffer to overflow and gets the transfer out of sync. It isn't a problem in XModem-CRC because the start character is not the same as the NAK character.

The problem was fixed by looking for additional characters in the serial receive buffer after a full packet has been received. Normally the buffer should be empty because the sender is waiting for a response before continuing with another packet. If TommyPROM detects characters in the buffer after the first packet, it flushes the buffer to get back into sync and sends a NAK to tell the sender to continue with the transfer,

This has all been tested in both the default checksum mode and in CRC mode with both TeraTerm and minicom. Both are able to send and receive files with no errors.

@NigelTwo - I'm putting the default back to checksum mode because that works with minicom without changing the transfer parameters and also means that you don't need to remember to check the CRC option in the TeraTerm transfer dialog box. It's fortunate that I randomly chose CRC mode when writing the original code because I would have never tracked this problem down and probably would have given up on the whole thing.

As an aside, I spent many years working on a CSU/DSU product that connected to Cisco routers. Every time I went to configure one of those things in the lab, the one hour project always turned into a one day project because the router wouldn't have the right software version for the feature I needed and then wouldn't have enough RAM or ROM for the new software release. It was always an adventure. Also, our product had a similar bare-metal recovery mode, but if I think it used ZModem if IIRC.

NigelTwo commented 3 years ago

@TomNisbet. You wrote: "TommyPROM sends one or more NAK characters to start the transfer" as the root cause of this issue.

In previous versions the Xmodem::StartReceive() method emitted a "start" character every 1second. I notice this has been extended to 3s in the current checked in code. Even this might be too aggressive. The reason is because we (the users) all initiate the receive function on the TommyPROM Arduino, then switch to our TeraTerm/minicom and begin fumbling around in menus and selecting files etc. Many seconds pass... and the start characters accumulate.

A quote from the document I was reading yesterday might be a hint: "7.3.2 Receive_Program_Considerations The receiver has a 10-second timeout. It sends a NAK every time it times out. The receiver's first timeout, which sends a NAK, signals the transmitter to start. Optionally, the receiver could send a NAK immediately, in case the sender was ready. This would save the initial 10 second timeout. However, the receiver MUST continue to timeout every 10 seconds in case the sender wasn't ready. Once into a receiving a block, the receiver goes into a one-second timeout for each character and the checksum. If the receiver wishes to NAK a block for any reason (invalid header, timeout receiving data), it must wait for the line to clear. See "programming tips" for ideas.

I am not sure that I can get my terminal program into sending mode even in 10seconds!

Regarding the recovery (buffer flushing) and getting back into sync, better check what packet number TeraTerm resends - 01 or 02? Or am I being too paranoid here? sorry but I haven't looked into that part of the TommyPROM Xmodem code. Hopefully TommyPROM doesn't process the received packet if it finds more incoming data pending. I think you said that in your last comment.

TomNisbet commented 3 years ago

I did originally set the receive start timeout to be longer but then there could be as much as a 10 second delay before the transfer started. If I connect two computers directly that are running TeraTerm and minicom they start the transfers very quickly, so I don't think there is any harm in the current code. I probably could get rid of the initial NAK because, as you point out, no one is going to be able to start their transfer immediately. At this point I've tested it pretty extensively and don't want to go poking around again unless someone is reporting a problem. I'd be just as likely to muck something up. Besides, I really want to get back to my 8-bit build.

Regarding the processing, it is checking and discarding extra characters after receiving the first packet. The first packet itself is fine, the sequence numbers and checksum are all valid. The problem is that while the code is busy writing that packet to the EEPROM, the sender is pushing another packet down the line because it thinks it has a NAK. This overflows the Arduino buffer and everything is out of sync after that.

To detect this condition, the TommyPROM code now pauses after a packet is received to see if any more characters are coming in. This should never happen because the sender should always be waiting for a NAK or ACK after sending a packet. If there is extra data after the first packet, the code just eats it and NAKs to get back into sync and then everything continues normally. If extra data is received beyond that, then the transfer just fails because something else is clearly wrong.

This was an interesting problem to debug. It definitely took me back to technology I haven't worked on in many years. I wish I had one of those old serial communication analyzers to capture and display everything. That would have been a lot easier than my hacked together Arduino Mega capture setup. Of course, those old analyzers probably wouldn't have had a setting for 115200!

Thanks for looking into this. Its nice to have another set of eyes on the problem.