dhylands / rshell

Remote Shell for MicroPython
MIT License
955 stars 136 forks source link

Fix flow of ACKs to ensure we do not hang on final buf write #155

Open shaunrs opened 3 years ago

shaunrs commented 3 years ago

Using Windows 10 and a pyboard, often times a file transfer hangs and the only solution is to unplug the board and kill rshell. I observe this behaviour with 9 out of 10 larger file transfers (4KiB). Smaller transfers do not seem affected.

I'm not familiar enough with the specific mechanism of file transfer to describe the root cause here, but this behaviour appears to happen when send_file_to_remote has returned (no more data to send), but recv_file_from_remote is still executing, causing Device.remote("recv_file_from_host", "send_file_to_remote" .. ) to hang indefinitely. Presumably recv is waiting for more data that is never coming?

It seems the current code would never send/receive the final packet's ack, as they are processed at the start of the while loop. It is therefore possible for a last-packet timeout to go unnoticed by rshell . By ensuring acks are only sent after the remote has received and written the packet, and subsequently verified after local has sent a full packet to the remote we can make this more robust. This resolves my file transfer issues.

shaunrs commented 3 years ago

Good point, that makes sense. I'd like to keep the last-packet ACK in there, whilst it isn't a data integrity ACK there is value in being able to detect last-packet timeouts which otherwise cause rshell to hang indefinitely.

I'm happy to implement this, and get feedback. Which method would you prefer:

  1. Initial ACK to start transmission - as it was before my PR, but including last-packet ACK
  2. Initial XON (DC1) to start transmission - feels a little contrary to ACK for flow control, but is explicit. The sender is 100% sure that receiver is in the correct state, at the start of a fresh recv:while loop. However, I don't see a case where you could end up in a bad state without this, so it might be overkill.

Personally: I'd go for 2 purely because it is much more explicit. It accounts for cases where the code changes in future and/or logic is added to this feature that may introduce other edge cases. But will implement this how you see best fits the project :)

davehylands commented 3 years ago

The actual character used doesn't really matter. I'd avoid using Control-C, but DC1 (0x11) should be fine.

shaunrs commented 3 years ago

The actual character used doesn't really matter. I'd avoid using Control-C, but DC1 (0x11) should be fine.

Sorry for the long delay, been very busy on other things. This is pushed using DC1.

shaunrs commented 2 years ago

@davehylands I understand you're likely super busy, but I'm keen to get this merged if possible? I'm currently using a local fork in all my projects :)