daniel5151 / gdbstub

An ergonomic, featureful, and easy-to-integrate implementation of the GDB Remote Serial Protocol in Rust (with no-compromises #![no_std] support)
Other
291 stars 45 forks source link

Support "ack/nack" packets for unreliable transports #137

Open daniel5151 opened 1 year ago

daniel5151 commented 1 year ago

While most users of gdbstub are likely debugging over transport mechanisms that are inherently reliable (i.e: pipes, TCP, emulated serial, etc...), it would be good to nice if gdbstub also properly supported use-cases where the transport mechanism is not reliable (e.g: a noisy physical serial line).

See docs at https://sourceware.org/gdb/onlinedocs/gdb/Packet-Acknowledgment.html#Packet-Acknowledgment

gdbstub today has pretty minimal support for running in ack-mode, essentially assuming that all packets will send/recv successfully.

As such, gdbstub will currently fail if:

  1. The host sends a "nack" packet (i.e: requesting re-transmission of the last packet)
    • gdbstub will simply error-out with a "nack not supported" error
  2. gdbstub receives a malformed packet (e.g: mismatched checksum, invalid packet format, etc...)
    • instead of sending a "nack" packet to the host and waiting for a response, gdbstub errors-out immediately

Possible Solutions

Solving 2 will likely require catching certain errors as they propagate up the call stack, and instead of bailing out, squelching them and sending a "nack" packet.

Solving 1 will be a bit tricker, as gdbstub currently require that outgoing packets get buffered anywhere.

One possible approach: augment the Connection trait with a fn retransmit(&mut self) -> Result<bool, Self::Error> method that downstream implementations can optionally implement? i.e: return true if supported, false if not supported (which would be the default impl).

After all - we guarantee that the flush() method is called after each packet, so it should be pretty trivial for a Connection to maintain a buffer of its last-sent packet.

I suspect that this can be done in a backwards-compatible way, without requiring a breaking API change, but I'm not 100% sure.

And of course - any solution should ideally minimize binary size impact when running over a reliable transport.

daniel5151 commented 1 year ago

@xobs this may or may not be relevant to https://github.com/betrusted-io/xous-core/issues/361

You might want to double-check what errors the gdbstub is reporting when it dies. If it's related to packet parsing / or nack packets, then it might be worth looking into fixing this.

On the other hand, I would've expected any reasonably modern serial hardware to be reasonably error free, so unless you're working with some particularly shoddy hardware, it may very well not have anything to do with this.

xobs commented 1 year ago

It's difficult to do that because the device has only one serial port so when debugging there is no other way to get errors out. The best I could do is to panic, or maybe buffer the messages and send them later. Unless there is a way to send strings to gdb that doesn't involve being a callback in a Monitor.

I think what's going on in this particular case is that the FIFO is getting filled because gdb is blasting it with data. The solutions in this case are:

  1. Increase the FIFO so that we can weather the onslaught
  2. Use ACK mode to prevent deep buffering
  3. Rework the debugger so that it doesn't run in an ISR but runs in a bottom-half. This is more complicated and involves spawning a second thread in the kernel -- doable, but untested and potentially difficult

So the issue isn't that errors are appearing, the issue is that GDB is sending data too fast and characters are getting dropped, and this serial link doesn't have CTS/RTS signals wired up. Perhaps XON/XOFF could work, if GDB's remote serial protocol supports that.

daniel5151 commented 1 year ago

I would definitely suggest setting up some kind of system to stash errors / panics across resets + report them when you get the chance (akin to something like panic_persist), and not just for debugging gdbstub issues (seems like a useful thing to have in general 😅)

the issue is that GDB is sending data too fast and characters are getting dropped

Hmmmm. On one hand, this does sound like something that you might potentially be able to solve on your end... but on the other hand, if characters are getting intermittently dropped (for whatever reason - be it a noisy line, or a line that's occasionally sending data too quickly), it might make sense to just work around that by investing in proper ack/nack infrastructure?

As I allude to in the top-level comment, hacking together a slap-dash fork of gdbstub with a POC nack/retransmission implementation might not be too hard. I would (selfishly) suggest taking a crack at that (if you're so inclined), and if it fixes the stability issues, I can absolutely help to polish it up / find some time to write a polished implementation myself.

xobs commented 1 year ago

Good news on this front -- part of the problem was due to the fact that we were rewriting the process pagetables as part of every access, AND every access was bytewise. By grouping accesses into 16- and 32-bit operations, and by simply setting a CPU register bit rather than flusing the pagetables, we've improved things to the point where it's more reliable. I'm still waiting to hear back on whether No-Ack is still required, but for the most part this seems to be much improved.

I'll leave this issue here anyway in order to track No Ack mode, but I thought you'd like to hear about this success.

https://github.com/betrusted-io/xous-core/pull/380

xobs commented 1 year ago

Actually, it seems as though this is working for our purposes even WITH no-ack mode.

And with the latest patch making no-ack optional, I think this issue can be closed now.

daniel5151 commented 1 year ago

Hey, that's great to hear! Glad you were able to sort out your reliability issues!

I'm going to leave this issue open, as ack mode is technically a part of the GDB RSP that gdbstub should have support for (eventually).