Closed xobs closed 2 years ago
Adding more context from https://github.com/daniel5151/gdbstub/issues/31 -- Specifically the target I have in mind is Xous running on Betrusted.
Xous is an operating system we're working on. It's bare-metal and runs on custom hardware. Currently it is targeting Betrusted, which runs a RISC-V core on a Xilinx FPGA in a keyboard "phone" form factor. Xous is written entirely in Rust, and is a microkernel architecture. We have the beginnings of libstd ported and running on Rust 1.52.1
, however debugging it up until now has been a challenge. It would be fantastic if we could get a full debugger working.
The kernel itself is a cooperative microkernel and only has access to a TRNG for generating random server IDs. We're considering adding a timer for certain interruptable mutex operations as an extension. All operating system fundamentals are handled by servers. For example, the console UART is handled by a log server, which acts as stdout.
The current approach to debugging is to use Renode, which has support for acting as a GDB server. Its support for the MMU is still somewhat limited, which makes perfect sense given that it's either used to debug deeply-embedded projects that have no MMU, or to run something heavy like Linux that has support for native debuggers.
gdbstub
allows us to either repurpose the console UART, or create an entirely new UART that only speaks GDB. With simple monitor
commands we can switch processes and enable arbitrary debugging of processes within the system. Renode supports connecting an emulated UART to a TCP socket, so gdb
should be able to talk directly with the emulated kernel, and should work just fine when running on real hardware.
I'm trying to use 297e06561d5ee729132882c21b4a280abbfe9366 but I have a few questions:
gdbstub::Connection::read()
when there are no bytes? Data is now being injected via pump()
, so it would seem that Connection
really only needs to implement write()
. Additionally, peek()
will now always return None
since characters are injectedMultiThreadOps::resume()
? If the stub decides to issue a resume, I'll unblock that process. However, once the process is running, I'll need to return something indicating success.In my kernel I'm introducing a new process state that indicates a process is being debugged, which allows other processes to interact with it but without allowing it to get scheduled. For example, other processes can send it messages which will pile up in its inbox.
Thanks for opening the issue, and for giving my POC branch a try!
Like I said in the other thread, I'm really excited to see what you come up with, since using gdbstub
for bare-metal debugging is something I've been eager to validate for a very long time.
To answer your questions about 297e065:
Connection::read
and Connection::peek
As you're aware, the current design still requires passing a Connection
to GdbStubStateMachine
, even though under-the-hood, it'll never actually call Connection::read
. This is definitely an API wart that should get buffed out before merging a final version, but for now, it should be reasonable to simply have Connection::read
panic!
if it is ever called (which it won't).
As for Connection::peek
: the only place it is ever called is as part of the call to GdbInterruptNoAsync::pending
, which is used to poll for incoming GDB client interrupts. As I mentioned in my original comment, you'll want to bypass this mechanism entirely, and check for interrupt packets (which is just a bare 0x03
byte) as part of your external interrupt handler infrastructure. As such, you can simply return None
from Connection::peek
, or alternatively, have it panic!
as a safeguard against accidental misuse.
MultiThreadOps::resume
To answer the first part of your question: a full list of stop reasons is enumerated by the ThreadStopReason
enum.
To answer the second part of your question, MultiThreadOps::resume()
's API is designed such that an implementor will resume the target, and then block until some kind of stop event occurs. There is no mechanism to return a "okay, the target is now running" message.
This has been totally fine up to now, as gdbstub
only implements All-Stop mode, where the GDB client does not send any packets (aside from the 0x3 interrupt packet) while a target is running. This is in contrast to Non-Stop mode, where background threads can execute freely while the gdbstub continues to service packets.
One key thing to note is that All-Stop mode is by far the most common way to implement a gdbstub
, and I have yet to find a reasonable implementation of a non-gdbserver GDB stub that implements this mode. This is because implementing Non-Stop mode introduces the challenge of how to handle asynchronously sending notification packets when the target encounters a stop event. I'll be honest, I've been putting off working on this feature since I've always had a sneaking suspicion it'll require some non-trivial rework of large chunks of gdbstub
's connection management infrastructure...
In addition, supporting non-stop mode will also require "mode-gating" certain current/future protocol extensions, as not all packets are supported when non-stop mode is active. e.g: Remote File I/O isn't supported in non-stop mode, nor is the O xx
stop reply packet, etc... Not a deal-breaker of course, but just some added complexity that'll need to be handled appropriately.
Now, with all the context out of the way, lets pivot back to the problem at hand: how should you implement resume
. Well... given that you're already running this code in an interrupt handler, and you've already got the infrastructure to context switch out of a IRQ, if it were me, I would probably have just structured the code as follows:
fn resume(...) {
resume_thread(tid);
__context_switch();
// the process runs, encounters some stop event, and you've context-switched back to here
let reason = retrieve_stop_reason(tid); // <-- assuming you've stashed away the stop reason somewhere
return translate_stop_reason(reason);
}
If this isn't feasible, then congratulations, you've opened Pandora's box and forced me to consider how gdbstub
could implement Non-Stop mode! Getting that working will likely be quite an endeavor, and not something I could hammer out a POC of in a single evening.
Alternatively, we could also explore a similar state-machine based API for resuming targets, but I'd expect that to also require quite a bit of non-trivial development effort and refactoring...
Let me know what you think!
The problem with that code is that it's running in an interrupt handler, and so has no context to begin with. In this kernel at least, a brand-new stack frame and context is generated every time an interrupt hits, and it's disposed of when the interrupt exits. Things in the .data
section persist though, and they can have one word of data passed to them.
The interrupt handler looks something like this:
static mut GDB_SERVER: Option<(GdbStubStateMachine<XousTarget, Uart>, XousTarget)> = None;
pub fn irq(_irq_number: usize, _arg: *mut usize) {
let b = Uart {}
.getc()
.expect("no character queued despite interrupt") as char;
// GDB server exists, hand character to the GDB server
unsafe {
if let Some((gdb, target)) = GDB_SERVER.as_mut() {
gdb.pump(&mut target, b).unwrap();
return;
}
}
// Not currently in GDB, process normally.
match b {
// ...
'g' => {
use gdb_server::*;
println!("Starting GDB server -- attach your debugger now");
let xous_target = XousTarget::new();
match gdbstub::GdbStubBuilder::new(Uart {})
.with_packet_buffer(unsafe { &mut GDB_BUFFER })
.build()
{
Ok(gdb) => unsafe { GDB_SERVER = Some((gdb, xous_target)) },
Err(e) => println!("Unable to start GDB server: {}", e),
}
}
'h' => print_help(),
}
}
Therefore, code execution will flow into gdbstub
as part of gdb.pump()
but I don't think it will ever exit. Or will it? The state should be stored in the .bss
in the GDB_BUFFER
variable, but as I said the actual stack frame is thrown away when the interrupt returns.
"blocking" until a stop event occurs is fine, and was the mental model I've been working with. Except we wouldn't block, we'd return. The process in question would be descheduled.
Calling resume(...)
would re-schedule the process and then return from the interrupt handler. At this point, the process can stop in one of two ways:
gdbstub
via pump()
.EBREAK
call. This will get trapped by the interrupt handler, and will essentially enter the same function. Perhaps that should call something like gdb.request_break(); gdb.pump(None);
?I almost wonder if this isn't a thing that async
couldn't help with. I don't know enough about async
, but I gather that it makes state machines such as this, and it does seem like the inner function simply needs to yield while it waits. But perhaps that is overkill.
Ahhh, I understand...
My shoddy little microkernel used stateful interrupt handlers that actually maintained their own separate execution contexts, but it's totally reasonable to keep interrupt handlers lean instead (arguably better tbh).
I almost wonder if this isn't a thing that
async
couldn't help with.
Kinda. Ideally, Rust would stabilize native generators, since what we're really interested here is some sort of "yield" construct. It would be unfortunate to add a bunch of async
complexity to an otherwise quite straightforward API...
That said, I just took a fresh look at the source code, and I realized that past-me may have inadvertently structured gdbstub
's implementation in such a way that adding a "return after resume" path wouldn't actually require that much refactoring! To see what I mean, take a look at how target-resume is handled within GdbStub
https://github.com/daniel5151/gdbstub/blob/master/src/gdbstub_impl/ext/base.rs#L587-L604
You'll find that the do_vcont
method (which is invoked when the GDB client wishes to resume a target) ends up inking a target's resume
method, obtaining a stop reason, and then invoking a separate finish_exec
function. In other words, the act of calling + returning from resume
is already decoupled from the act of reporting a stop reason to the GDB client!
With this in mind, it shouldn't be too difficult to add a new StopReason::Resumed
variant, which when returned, will short-circuit the current do_vcont -> finish_exec loop, and manually yield execution back to the callee of gdb.pump()
(possibly returning something like DisconnectReason::WaitForResume
). At that point, you can safely tear down the interrupt context, switch back to the task, and once you're back in the interrupt handler, you can wire up a direct interface to call finish_exec
with the appropriate stop reason.
Now, unfortunately, I am going to be quite busy for the next week or so (work is ramping up + it'll be a long weekend here in the states), so I don't think I'll have time to hack together an implementation until sometime next week. That said, if you're willing to get your hands dirty, I think I've provided enough context such that you can fork gdbstub
and hack together a rough POC. If you can pull that off, you should be able to start implementing / validating other bits of gdbstub
's functionality, and we can work on getting a more robust implementation upstreamed.
Let me know if what I described makes sense, and if you think that's something you might be able to tackle yourself.
That does make sense, and I'll see if I can hack something together. Unfortunately, I'll also be busy with another project for the next week, so I'm not sure how much I'll be able to complete.
Sounds good.
I'll set a reminder to circle back on this in a week or so to see how things are going, and in the meantime, I'll try and squeeze in some time to throw something together myself.
Something strange is happening with linking. When I include the call to gdb.pump()
it looks like it uses a different atomic primitive than exists. Or maybe I have to manually create that symbol. This is building on riscv32imac, which has atomic instructions:
$ cargo build --target riscv32imac-unknown-none-elf
...
= note: rust-lld: error: undefined symbol: __atomic_load_4
>>> referenced by atomic.rs:0 (C:\Users\smcro\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library/core/src/sync\atomic.rs:0)
>>> lto.tmp:(core::sync::atomic::AtomicUsize::load::h6c2d4ab500043b24)
>>> referenced by atomic.rs:0 (C:\Users\smcro\.rustup\toolchains\stable-x86_64-pc-windows-msvc\lib/rustlib/src/rust\library/core/src/sync\atomic.rs:0)
>>> lto.tmp:(core::sync::atomic::atomic_load::hd311338cf6f0be89)
error: aborting due to previous error; 6 warnings emitted
error: could not compile `kernel`
To learn more, run the command again with --verbose.
$
If I comment out the call to gdb.pump()
it works. Let me try to dig into it and figure out what sort of atomic calls it's looking for...
Ahh, how odd...
I know for a fact that gdbstub
itself doesn't use any atomics directly (or any synchronization primitives at all for that matter). Indirectly, the only dependency that might be using atomics under-the-hood would be the log
crate, which does try and use atomic instructions when available...
Fascinating. I'm pretty sure it's a rust bug. If I build for riscv32i-unknown-none-elf
, which as no atomic support, it builds just fine. If I build it for 1.51, it builds just fine. However, Rust 1.52 breaks.
I'll open a bug there. In the meantime, I have workarounds.
I opened a new bug at https://github.com/rust-lang/rust/issues/85736 -- seems like a regression, since it works in the older compiler.
It's a regression in lto support, which unfortunately derailed my momentum. Now I'm pondering how to actually implement single-step debugging.
Do you know if gdb is happy to run without single-step support?
Alright, I'm back from the long weekend, and it's time to get back into the swing of things. Lets see what's going on here...
Ah, yeah, would you look at that! It seems you really did stumble on a regression in the Rust compiler. Good thing you reported it, and hopefully it'll get fixed ASAP!
As for your question regarding single stepping, I would refer to the GDB docs here: https://sourceware.org/gdb/current/onlinedocs/gdb/Overview.html#Overview
At a minimum, a stub is required to support the ‘?’ command to tell GDB the reason for halting, ‘g’ and ‘G’ commands for register access, and the ‘m’ and ‘M’ commands for memory access. Stubs that only control single-threaded targets can implement run control with the ‘c’ (continue) command, and if the target architecture supports hardware-assisted single-stepping, the ‘s’ (step) command. Stubs that support multi-threading targets should support the ‘vCont’ command. All other commands are optional.
And hey, would you look at that! It turns out that you can get away without supporting single stepping support! I'll be honest, I totally didn't realize that something as seemingly fundamental as single-stepping is actually an optional operation, and as such, gdbstub
doesn't actually have a mechanism to signal that the target does not support single stepping...
Thanks for bringing this to my attention! Thankfully, this wouldn't be a difficult option to "plumb" through. My gut feeling is that we can just add a new fn supports_single_step(&self) -> bool
method to {Single,Multi}ThreadOps
(including a default impl that returns true
for backwards-compat reasons), which is then used by gdbstub
internally to determine if the response to vCont?
should include support for single-stepping.
While this isn't actually the cleanest solution (requiring implementors to add a logically unreachable match arm in their resume
implementation) it is backwards compatible and easy to implement. We can land it as part of these other bare-metal API changes, and I'll pop open a tracking issue to clean up the API a bit for 0.6
(which will likely end up looking similar to how the current optional range-stepping API).
I'm expecting to budget some time to hack away at gdbstub
sometime later today / tomorrow, but in the meantime, you can "emulate" this functionality by manually modifying this line to omit the s;S
variants.
Lastly, I was also planning on taking a crack at implementing support for "deferred resume" sometime later today / tomorrow. I've got some neat ideas on how to structure the API, and I'm excited to see what I'll be able to hack together :)
Alright, I just pushed up a [hopefully] working version of "deferred resume" to the feature/state-machine-recv-packet
branch.
Fair warning: I have not had a chance to test this code outside of "making sure it compiles". That said, given that we are writing Rust, I feel fairly confident that it'll Just Work™️ right off the bat. Lets see if I'm right 😄.
Note that I decided to use a typestate based API to enforce correct GDB protocol sequencing at compile-time, and as such, the API might be a bit confusing if you're not familiar with this pattern. Please refer to the code in example_no_std/src/main.rs
for an overview of how the API should be used.
Please give these changes a shot, and let me know how it goes!
I didn't get the chance to work on making single-stepping optional, though I should have some time to finish it up tomorrow. In the meantime, you'll just have to use the workaround I mentioned above.
I've been thinking about this typestate token based API for a couple days now, and I've realized that to really make the API rock-solid, I think I'll need to make some API breaking changes at some point. That said, I'm not sure if that'll be as part of this WIP PR or as part of some future work, so for now, I'm just going to jot down these ideas before I forget them.
Note that these are not pressing issues, and are only useful as a way to "tighten up" the API and make it impossible to misuse the library at compile time. A truly malicious implementation could unsafely craft valid typestate tokens out of thin air, and there isn't anything we can do about it.
StopReason::Defer
can only be constructed as part of the resume
method.As it stands, there's nothing stopping the user from constructing a StopReason::Defer
and passing to to GdbStubStateMachine::deferred_stop_reason
. This is obviously incorrect, as it doesn't make any sense to defer a defer, and right now, this case is handled with a runtime check.
I realized that it'd actually be possible to enforce this property at compile time with some similar typestate token shenanigans:
// Create a new typestate token that is tied to a specific lifetime
#[non_exhaustive]
struct DeferredStopReasonToken<'a> {
lifetime: core::marker::PhantomData<&'a ()>
}
// update the StopReason::Defer to require an instance of `DeferredStopReasonToken`
enum StopReason<'a, U> { // <-- requires adding a lifetime parameter :'(
// ...
Defer(DeferredStopReasonToken<'a>)
}
// update the `resume` method to pass a valid instance of `DeferredStopReasonToken`
fn resume(
&mut self,
default_resume_action: ResumeAction,
gdb_interrupt: GdbInterrupt<'_>, // <-- i'll probably remove this in the next breaking version
defer: DeferredStopReasonToken<'_>, // <-- tied to the lifetime of `resume`!
) -> Result<ThreadStopReason<<Self::Arch as Arch>::Usize>, Self::Error>
{
// if a target wants to defer a stop reason, they must consume the defer token
return Ok(ThreadStopReason::Defer(defer))
}
With this infrastructure in place, we can observe the following: Back at the top level, in the scope where one would call GdbStubStateMachine::deferred_stop_reason
, there is no way to construct a DeferredStopReasonToken
out of thin air!
Moreover, since the lifetime of the token was tied to the resume
method, a target implementation wouldn't be able to "stash away" the token in an attempt to break this invariant.
GdbStub
The current system has a flaw whereby there's nothing stopping someone from constructing two-or-more instances of GdbStub
simultaneously, and using them to acquire arbitrary out-of-sequence typestate tokens. i.e: it is trivial to construct arbitrary typestate tokens out of sequence by instantiating + running transient GdbStub
instances with just enough dummy-data to yield the desired token.
The mitigation would be to somehow "tie" typestate tokens with a particular GdbStub
instance - preferably at the type system level. I've got a few ideas on how this might be possible, and the more I think about it, I feel like it ought to be possible to do this without requiring any breaking API changes.
Maybe I'll take a crack at this at some point this weekend...
FYI, I just pushed up yet another update that addressed the issue of "Tying typestate tokens to a unique instance of GdbStub". The solution? Get rid of tokens entirely, and instead have the GdbStubStateMachine
struct itself transition between states.
Shoutout to https://hoverbear.org/blog/rust-state-machine-pattern/ for providing a very easy-to-use example that could be modified with impunity.
As always, check out the example_no_std
code for details on how to use this API. It should be pretty straightforward.
Cheers
Thanks! That does seem easier. I'm struggling a bit with lifetimes trying to convince Rust to store mutable items inside an Option<T>
.
Specifically, my pattern looks something like this:
pub static mut GDB_SERVER: Option<GdbStubStateMachine<XousTarget, super::Uart>> = None;
pub static mut GDB_TARGET: Option<XousTarget> = None;
pub static mut GDB_BUFFER: [u8; 4096] = [0u8; 4096];
pub fn irq(_irq_number: usize, _arg: *mut usize) {
let b = Uart {}
.getc()
.expect("no character queued despite interrupt");
#[cfg(feature = "gdbserver")]
unsafe {
use crate::debug::gdb_server::{GDB_SERVER, GDB_TARGET};
use gdbstub::state_machine::GdbStubStateMachine;
use gdbstub::{DisconnectReason, GdbStubError};
if let Some(gdb) = GDB_SERVER.as_mut().take() {
let target = GDB_TARGET.as_mut().unwrap();
let new_gdb = match gdb {
GdbStubStateMachine::Pump(gdb_state) => match gdb_state.pump(target, b) {
// Remote disconnected -- leave the `GDB_SERVER` as `None`.
Ok((_, Some(disconnect_reason))) => {
match disconnect_reason {
DisconnectReason::Disconnect => println!("GDB Disconnected"),
DisconnectReason::TargetExited(_) => println!("Target exited"),
DisconnectReason::TargetTerminated(_) => println!("Target halted"),
DisconnectReason::Kill => println!("GDB sent a kill command"),
}
return;
}
Err(GdbStubError::TargetError(_e)) => {
println!("Target raised a fatal error");
return;
}
Err(_e) => {
println!("gdbstub internal error");
return;
}
Ok((gdb, None)) => gdb,
},
// example_no_std stubs out resume, so this will never happen
GdbStubStateMachine::DeferredStopReason(_) => {
panic!("Deferred stop shouldn't happen")
}
};
GDB_SERVER = Some(new_gdb);
return;
}
}
match b {
#[cfg(feature = "gdbserver")]
b'g' => {
use gdb_server::{XousTarget, GDB_BUFFER, GDB_SERVER, GDB_TARGET};
println!("Starting GDB server -- attach your debugger now");
unsafe { GDB_TARGET = Some(XousTarget::new()) };
match gdbstub::GdbStubBuilder::new(Uart {})
.with_packet_buffer(unsafe { &mut GDB_BUFFER })
.build()
{
Ok(gdb) => match gdb.run_state_machine() {
Ok(state) => unsafe { GDB_SERVER = Some(state) },
Err(e) => println!("Unable to start GDB state machine: {}", e),
},
Err(e) => println!("Unable to start GDB server: {}", e),
}
}
// Other characters handled here
}
// Remainder of `debug` console here, for when a terminal is not connected
}
The problem I'm running into right now is that gdb_state.pump(target, b)
consumes gdb_state
, which is a member of gdb
and is stored in the .data
section (as part of gdb
I'm guessing?). So I can't convince the compiler to build this:
error[E0507]: cannot move out of `*gdb` which is behind a mutable reference
--> src\debug.rs:311:57
|
311 | GdbStubStateMachine::Pump(gdb) => match gdb.pump(target, b) {
| ^^^ move occurs because `*gdb` has type `GdbStubStateMachineInner<'_, gdbstub::state_machine::state::Pump, XousTarget, Uart>`, which does not implement the `Copy` trait
error: aborting due to previous error; 2 warnings emitted
For more information about this error, try `rustc --explain E0507`.
error: could not compile `kernel`
To learn more, run the command again with --verbose.
Apologies, I've managed to at least convince it to compile by removing the .as_mut()
:
if let Some(mut gdb) = GDB_SERVER.take() {
let target = GDB_TARGET.as_mut().unwrap();
let new_gdb = match gdb {
GdbStubStateMachine::Pump(gdb_state) => match gdb_state.pump(target, b) {
// Remote disconnected -- leave the `GDB_SERVER` as `None`.
Ok((_, Some(disconnect_reason))) => {
match disconnect_reason {
DisconnectReason::Disconnect => println!("GDB Disconnected"),
DisconnectReason::TargetExited(_) => println!("Target exited"),
DisconnectReason::TargetTerminated(_) => println!("Target halted"),
DisconnectReason::Kill => println!("GDB sent a kill command"),
}
return;
}
Err(GdbStubError::TargetError(_e)) => {
println!("Target raised a fatal error");
return;
}
Err(_e) => {
println!("gdbstub internal error");
return;
}
Ok((gdb, None)) => gdb,
},
// example_no_std stubs out resume, so this will never happen
GdbStubStateMachine::DeferredStopReason(_) => {
panic!("Deferred stop shouldn't happen")
}
};
GDB_SERVER = Some(new_gdb);
return;
}
Still working on it, but in case you're curious the [messy] repo is checked in at https://github.com/betrusted-io/xous-core/blob/debugger/kernel/src/debug.rs
I need to add OS-level support for addressing memory and putting processes in a Debug(u32)
state, but at least the possibility is there.
Fantastic! I'm glad to see that you've got at least a "skeleton" of a full-fledged gdbstub
integration up and running 😄
As always, I'm looking forward to seeing how your implementation shapes up, and whether you run into any more API papercuts. Once you've got something up and running, we can move on to polishing up the feature branch and merging it into master
🚀
By the way, you should remove those #[inline(never)]
calls - they're only there in the example_no_std as a profiling measure to take the size of the dummy code into account. Production code should be annotation free.
Oh, and one more thing: I noticed that you've already got a few OS specific debug features in your code (e.g: dumping page tables, reporting ram usage, etc...) which become inaccessible once you enter GDB mode.
In case you aren't aware, but the GDB RSP actually defines a mechanism by which targets can implement custom client commands, which gdbstub
exposes via the MonitorCmd
extension. In a nutshell, you can set it up such that when a user types monitor [payload]
from the GDB client, the target will receive the payload and have a chance to interpret + send arbitrary data back to the host console. Using this feature, you can define custom commands such as monitor ram
to dump ram, monitor page-tables
to dump page tables, etc... all while in the middle of a GDB debugging session!
Yep! I was planning on adding monitor
commands to allow switching processes, since GDB doesn't really have anything like that. Threads will be threads, and processes will be managed by way of the process
command.
Right now the big issue is supporting pausing processes on the system, so it's taking a bit longer than I'd like to integrate things.
I was planning on adding
monitor
commands to allow switching processes, since GDB doesn't really have anything like that.
Actually, that isn't true. GDB can attach
to multiple running processes, and debug all of them simultaneously. This functionality is available when using the RSP when the target supports running in extended-mode
, which coincidentally, is something gdbstub
already supports via the ExtendedMode
IDET.
Unfortunately, while gdbstub
already supports running in extended mode, I have never gotten around to adding a MultiProcessOps
interface to compliment the existing SingleThreadOps
and MultiThreadOps
interfaces. As it happens, gdbstub
already has almost all of the internal infrastructure required to add a MultiThreadOps
interface (notably, gdbstub
unconditionally uses the RSP multiprocess extensions under-the-hood), and the only reason this interface hasn't been implemented is that up until this point, no-one has needed to debug multiple processes at once. Implementing support for this has been on my TODO list for a while, but given its relatively high effort+testing to payoff ratio, i'm not sure when I'll get a chance to implement it...
That said, you probably can leverage the current ExtendedMode
support alongside the already-implemented MultiThreadOps
interface to switch between processes. i.e: whenever you attach to a new process, you immediately detach from the previous process, thereby always having a single active process being debugged. While it would undeniably be a bit janky, it would likely be cleaner than using a custom monitor
command to switch processes, as the GDB client will be enlightened to the fact that you've switched processes, and can update its internal model of the remote system accordingly (e.g: switching to a fresh set of breakpoints, using a different set of symbols (if appropriate), etc...)
Just a heads up, but now that #60 has been reported, I'm almost certainly going to be publishing a 0.6
release of gdbstub
at some point in the next few weeks. It'd be nice if we could nail down any lingering warts / issues in the current state machine / broader gdbstub
API so I can lump these changes in with other 0.6
changes.
While the changes in feature/state-machine-recv-packet
are technically non-breaking and could be punted to a 0.6.x
release, it would be nice to merge some of the internal plumbing improvements from that branch into master, as they lay the groundwork for some future features I've been meaning to implement (e.g: support for Non-Stop mode, GDB Agent support, etc...). Moreover, I'm somewhat tempted to remove the current "blocking" GdbStub::run
method entirely, and make the state machine interface the primary interface to gdbstub
. By doing so, I'd also be able to remove the read
method from Connection
.
As always, any and all feedback would be much appreciated!
I apologise, I was called away to another project. However, I am now able to resume work on this, and I will continue to work to integrate gdbstub
into Xous
.
I just managed to get process suspension working. That is, I can now move a process from Ready(x)
to Debug(x)
, which means that it will not get scheduled. The rest of the system continues to work, and IPC continues to function. This means I can now implement the debugger.
I got it wired up, so now I'm able to peek and poke memory, inspect registers, and list threads. There are a few things to note.
It's not able to continue execution. When I say c
it seems to come back immediately for whatever reason:
c continue
w $vCont;c:p1.-1#0f
r <Timeout: 0 seconds>$S05#b8
w $g#67
r $0*$f413060020ff0560*-80ff0560f400*!8000*!60ff0560203106600f00*!eb73cd4837ea903d130d0215015c260d0*52000*!400*!243106600f000*!c0*B98210660f000*!30*$200*!f*$3e1e0600#9a
w $qfThreadInfo#bb
r $mp01.01,p01.02,p01.03,p01.04,p01.05,p01.06,p01.07#2a
w $qsThreadInfo#c8
r $l#6c
The first thing GDB does is list all threads in the process. However, when the debugger is first attached, there is no process selected. I've worked around this in my implementation by arbitrarily saying PID 3 is the process being debugged. It's enough to keep things going, but it's not ideal.
Here's an example of a process being debugged:
[4:40:18 PM] ~/Desktop> riscv64-unknown-elf-gdb -ex 'set remotelogfile gdbproxy_logfile.org' -ex 'tar ext :9999' D:\Code\Xous\Core\target\riscv32imac-unknown-none-elf\release\shellchat
GNU gdb (SiFive GDB 8.3.0-2019.08.0) 8.3
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "--host=x86_64-w64-mingw32 --target=riscv64-unknown-elf".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://github.com/sifive/freedom-tools/issues>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>. For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from D:\Code\Xous\Core\target\riscv32imac-unknown-none-elf\release\shellchat...
Remote debugging using :9999
_xous_syscall_rust () at src/asm.S:11
11 lw t0, 0(sp)
(gdb) info thr
Id Target Id Frame
* 1 Thread 1.1 _xous_syscall_rust () at src/asm.S:11
2 Thread 1.2 _xous_syscall_rust () at src/asm.S:11
3 Thread 1.3 _xous_syscall_rust () at src/asm.S:11
4 Thread 1.4 _xous_syscall_rust () at src/asm.S:11
5 Thread 1.5 _xous_syscall_rust () at src/asm.S:11
6 Thread 1.6 _xous_syscall_rust () at src/asm.S:11
7 Thread 1.7 _xous_syscall_rust () at src/asm.S:11
(gdb) bt
#0 _xous_syscall_rust () at src/asm.S:11
#1 0x000613f4 in xous::syscall::rsyscall (call=...) at xous-rs\src/syscall.rs:1616
#2 xous::syscall::receive_message (server=...) at xous-rs\src/syscall.rs:1263
#3 0x000478f4 in xous_entry () at services\shellchat\src\main.rs:282
#4 0x000494f6 in shellchat::_start (pid=15) at D:\Code\Xous\Core\xous-rs\src/lib.rs:168
Backtrace stopped: frame did not save the PC
(gdb)
Alright, I realized the problem with (1) above where it would get interrupted immediately.
Previously I returning ThreadStopReason::GdbInterrupt
from resume()
, which would immediately cause it to pause. Now that I return ThreadStopReason::Defer
, it properly resumes.
The issue now is that when I hit Control-C, it's in an odd state. Notably, the GdbStubStateMachine
is in the DeferredStopReason(_)
state, which makes sense. The IRQ handler I'm working on processes one character at a time, so it doesn't currently handle gdb
being in any state other than Pump
.
What would be the best way to handle this? That is, I have a character that's appeared because gdb has sent a packet, but the state is in DeferredStopReason(_)
so it is unable to accept any new characters.
Alright, I've pushed up a hotfix that should unblock you.
I've updated the code in the armv4t
example to use the state machine API, which allowed me to validate + debug the full e2e state machine flow - including how to handle interrupts packets.
~The hotfix I've pushed up is a bit gnarly, but it should suffice until I find some time this weekend to hammer out a cleaner API. In a nutshell: that character you're receiving when you hit ctrl-c should be 0x03
, which is a special non-ascii packet GDB sends to signal a client interrupt. For now, you ought to assert that the byte you received in the DeferredStopReason
state was in-fact 0x03, and then pass the ThreadStopReason::GdbInterrupt
state as a deferred stop reason. See the armv4t
example code for details.~ See EDIT below...
Note that technically speaking, you're allowed to pass whatever stop-reason you like when you're interrupted. ThreadStopReason::GdbInterrupt
is simply a convenient alias for ThreadStopReason::Signal(5)
, or "stopped due to trap".
I think the proper fix will involve exposing the call to pump
as part of the DeferredStopReason
state, and adding a new GdbStubStateMachine::Interrupted
state, which will be transitioned into in the case that the packet parser encounters an interrupt packet. I'm still mulling things over, so if you have any suggestions, do let me know.
EDIT: I found some time to play around with the API, and after a bit of iteration and experimentation, I've settled on something I'm fairly happy with.
In a nutshell, I added a new pump
method to GdbStubStateMachine::DeferredStopReason
. This pump
method is similar to the other pump method, except it reports an Event
instead of a Option<DisconnectReason>
. An Event
can be one of None
, Disconnect(DisconnectReason)
, and a new event: CtrlCInterrupt
. When you receive an Event::CtrlCInterrupt
, you should immediately run whatever interrupt logic you need to interrupt the debugging session (e.g: stop the process being debugged + report the deferred stop reason).
I also introduced a few easy-to-fix breaking API changes, such as removing the read
method from the Connection
trait, and removing the explicit StopReason::Defer
variant in favor of tweaking the resume
API to return an Option<StopReason>
.
I've updated the armv4t
example with some sample code that should make things pretty clear.
The first thing GDB does is list all threads in the process. However, when the debugger is first attached, there is no process selected. I've worked around this in my implementation by arbitrarily saying PID 3 is the process being debugged. It's enough to keep things going, but it's not ideal.
Unless you're implemented extended-mode
support, GDB will expect to be connected to a running process when it is connected. Please refer to my previous comment for more details: https://github.com/daniel5151/gdbstub/issues/56#issuecomment-855432897
Here is the relevant line from the GDB documentation:
With target remote mode: You must either specify the program to debug on the gdbserver command line or use the --attach option (see Attaching to a Running Program).
With target extended-remote mode: You may specify the program to debug on the gdbserver command line, or you can load the program or attach to it using GDB commands after connecting to gdbserver.
That said, do make note the disclaimer at the top of the ExtendedMode
IDET docs. Extended mode support is something I implemented in gdbstub
on a whim, and as far as I know, it hasn't been thoroughly validated by any "production" implementation 😅
I just merged initial support for gdbstub. It's very rough, but it's up at https://github.com/betrusted-io/xous-core/blob/main/kernel/src/debug/gdb_server.rs
Hey, that's fantastic news!
I can't begin to describe how happy I am seeing gdbstub
running in a true bare-metal no_std
environment 😊
Thanks again for helping test and validate this new state-machine based execution API for gdbstub
. I'm actually really excited about all the possibilities this new state-machine based API brings to the table, as aside from enabling gdbstub
to run via interrupt handlers, this API might also be the key to solving some of long-standing design questions, such as how to efficiently wait for Ctrl-C interrupts without polling, or how to support non-stop mode.
It seems that we're finally at the point where the broad strokes of the API are working as intended, and as such, I'll start hammering away at polishing up + documenting all these new changes. I intend to merge the current feature/state-machine-recv-packet
branch into dev/0.6
sometime in the next week, thereby unifying gdbstub
's two concurrent development streams. I'll let to know once that's done, as you'll definitely want to switch over to the dev/0.6
.
As a heads up, there will be a few more breaking changes coming down the pipeline in the dev/0.6
branch. Notably for your integration, I will be tweaking the resume
API to make single-stepping optional.
As always, please let me know if you run into any other API warts / issues while fleshing out your gdbstub
integration. If you have time time, I would really appreciate it if you could flesh out your gdbstub
integration over the next month or so, as that would let us catch / remedy any API issues prior to publishing release 0.6
.
Lastly, a couple things I noticed in your initial implementation (yes, I know it's very rough, but I'll still point things out regardless 😉):
#[inline(never)]
annotations for your Target implementation! The only reason those are there in example_no_std
are because if they weren't, the optimizing compiler would aggressively inline the dummy methods, making it substantially harder to gauge what kind of binary overhead gdbstub
introduces into a project. When integrating gdbstub
into an actual project, you want those aggressive optimizations, as it'll substantially reduce gdbstub
's binary overhead, and can significantly improve performance.
gdbserver
feature introduces, would you? I'd be curious to see if my ~10kb ballpark estimate was somewhat close.set/clear_resume_action
handlers, even if it's just to assert that the GDB client is only ever sending a single continue command.read_addrs
, instead of doing .unwrap_or(0xff)
of a memory read fails, you should either return a non-fatal TargetError
, or truncate the response to just the memory that was able to be read.
read_addrs
API doesn't support returning only part of the requested address space, even though the GDB RSP supports that feature. Looks like that's another breaking API change I'll be making in release 0.6
😅write_addrs
, instead of panicking if a memory write fails, you should return a non-fatal TargetError
instead, as this will report the error back to the GDB client.Here's a version with gdbstub enabled:
[10:01:19 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-size -d .\target\riscv32imac-unknown-none-elf\release\kernel
text data bss dec hex filename
102532 8312 4540 115384 1c2b8 .\target\riscv32imac-unknown-none-elf\release\kernel
[10:01:25 AM] D:/Code/Xous/Core/kernel>
And here it is with the gdbserver
feature disabled:
[10:04:35 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-size -d .\target\riscv32imac-unknown-none-elf\release\kernel
text data bss dec hex filename
79360 8276 436 88072 15808 .\target\riscv32imac-unknown-none-elf\release\kernel
[10:04:38 AM] D:/Code/Xous/Core/kernel>
It's closer to 20kB, but that may be due to various debug strings. If I remove #[inline(never)]
, then the size goes up slightly, and nearly everything gets inlined into the ISR:
No gdbserver:
[10:40:30 AM] D:/Code/Xous/Core/kernel> cargo build --release --target riscv32imac-unknown-none-elf --features print-panics --no-default-features
[10:41:50 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-size -d .\target\riscv32imac-unknown-none-elf\release\kernel
text data bss dec hex filename
79360 8276 436 88072 15808 .\target\riscv32imac-unknown-none-elf\release\kernel
[10:41:52 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-nm --size-sort .\target\riscv32imac-unknown-none-elf\release\kernel | rustfilt |rg debug
00000001 b kernel::debug::INITIALIZED.0.0
00000ac2 t kernel::debug::irq
[10:41:56 AM] D:/Code/Xous/Core/kernel>
With gdbserver:
[10:41:56 AM] D:/Code/Xous/Core/kernel> cargo build --release --target riscv32imac-unknown-none-elf
[10:42:17 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-size -d .\target\riscv32imac-unknown-none-elf\release\kernel
text data bss dec hex filename
103196 8312 4540 116048 1c550 .\target\riscv32imac-unknown-none-elf\release\kernel
[10:42:20 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-nm --size-sort .\target\riscv32imac-unknown-none-elf\release\kernel | rustfilt |rg debug
00000001 b kernel::debug::INITIALIZED.0.0
00000002 t core::ptr::drop_in_place<<kernel::debug::gdb_server::XousTarget as gdbstub::target::ext::base::multithread::MultiThreadOps>::is_thread_alive::{{closure}}>
00000024 d kernel::debug::gdb_server::GDB_STATE
0000002c t kernel::debug::irq
00000356 t <kernel::debug::gdb_server::XousTarget as gdbstub::target::ext::base::multithread::MultiThreadOps>::list_active_threads
00001000 b kernel::debug::gdb_server::GDB_BUFFER
0000447a t kernel::debug::process_irq_character
[10:42:22 AM] D:/Code/Xous/Core/kernel>
Keep in mind that I'm building for speed and not for size, though curiously enough if I switch to optimize for size and not speed, it also is around 20 kB:
[10:42:22 AM] D:/Code/Xous/Core/kernel> cargo build --release --target riscv32imac-unknown-none-elf --features print-panics --no-default-features
[10:43:14 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-size -d .\target\riscv32imac-unknown-none-elf\release\kernel
text data bss dec hex filename
64952 8276 436 73664 11fc0 .\target\riscv32imac-unknown-none-elf\release\kernel
[10:43:23 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-nm --size-sort .\target\riscv32imac-unknown-none-elf\release\kernel | rustfilt |rg debug
00000001 b kernel::debug::INITIALIZED.0.0
00000002 t core::ptr::drop_in_place<&mut kernel::debug::Uart>
0000001c t core::fmt::Formatter::debug_struct
00000026 t core::fmt::Formatter::debug_list
00000028 t <kernel::debug::Uart as core::fmt::Write>::write_str
00000678 t kernel::debug::irq
[10:43:26 AM] D:/Code/Xous/Core/kernel> cargo build --release --target riscv32imac-unknown-none-elf
[10:43:50 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-size -d .\target\riscv32imac-unknown-none-elf\release\kernel
text data bss dec hex filename
83748 8312 4540 96600 17958 .\target\riscv32imac-unknown-none-elf\release\kernel
[10:43:52 AM] D:/Code/Xous/Core/kernel> riscv64-unknown-elf-nm --size-sort .\target\riscv32imac-unknown-none-elf\release\kernel | rustfilt |rg debug
00000001 b kernel::debug::INITIALIZED.0.0
00000002 t core::ptr::drop_in_place<<kernel::debug::gdb_server::XousTarget as gdbstub::target::ext::base::multithread::MultiThreadOps>::is_thread_alive::{{closure}}>
00000002 t core::ptr::drop_in_place<gdbstub::gdbstub_impl::ext::base::<impl gdbstub::gdbstub_impl::GdbStubImpl<kernel::debug::gdb_server::XousTarget,kernel::debug::Uart>>::handle_base::{{closure}}>
00000002 t core::ptr::drop_in_place<gdbstub::gdbstub_impl::ext::base::<impl gdbstub::gdbstub_impl::GdbStubImpl<kernel::debug::gdb_server::XousTarget,kernel::debug::Uart>>::get_sane_any_tid::{{closure}}>
00000002 t core::ptr::drop_in_place<&mut kernel::debug::Uart>
0000001c t core::fmt::Formatter::debug_struct
00000022 t <kernel::debug::Uart as gdbstub::connection::Connection>::write
00000024 d kernel::debug::gdb_server::GDB_STATE
00000026 t core::fmt::Formatter::debug_list
00000028 t <kernel::debug::Uart as core::fmt::Write>::write_str
000000c2 t <kernel::debug::gdb_server::XousTarget as gdbstub::target::ext::base::multithread::MultiThreadOps>::list_active_threads
00001000 b kernel::debug::gdb_server::GDB_BUFFER
000023d6 t kernel::debug::irq
[10:43:53 AM] D:/Code/Xous/Core/kernel>
One thing I have not yet figured out is how to trap on a breakpoint, or even what a breakpoint is. My assumption had been that when gdb inserted a breakpoint, it would replace code in RAM with an illegal instruction that would cause an exception. Perhaps it's doing this and my kernel has configured the CPU to ignore such things. But as a result, I'm not able to issue commands such as finish
.
Additionally, stepi
never returns. Is this the sort of thing that the RSP needs to announce that it doesn't support, so gdb manually handles stepi
?
Thanks for sharing those numbers! In past experience, I've found that it can be pretty tricky measuring the true overhead of just gdbstub
code, since it is inextricably tied to the Target
implementation being used. And then there's also the consideration that RISC-V code is typically less "dense" than equivalent x86 code, which can also contribute to overhead. In any case, 20kb is still pretty negligible with today's flash memory sizes 😛
My assumption had been that when gdb inserted a breakpoint, it would replace code in RAM with an illegal instruction that would cause an exception.
Err, not exactly. When you set a breakpoint from the GDB client, the only thing the GDB client does is send the target a request to set a breakpoint at the specific address. The specifics of how to implement that breakpoint are left entirely up to the target. In other words, the GDB client will not automatically patch the instruction stream to insert a breakpoint - if that's how you want to implement breakpoints in your target, that's something you'll need to do yourself.
To get breakpoints working, you'll need to implement the appropriate breakpoint IDETs. In your case, you'll probably just want to start off with basic software breakpoints (i.e: patching the instruction stream with a breakpoint instruction + catching the exception), which would fall under the SwBreakpoints
IDET.
Additionally, stepi never returns.
This is likely a byproduct of your stubbed out set/clear_resume_action
handlers. When you run stepi
, the GDB client will request that the target perform a single-step by calling set_resume_action
with ResumeAction::Step
and the specified TID. You need to move your current "single-step unimplemented" check into set_resume_action
, as right now, it only checks if the default resume action (i.e: the resume action all non-specified threads in the process should take) is a single-step, which is mostly likely isn't (IIRC, default_resume_action
is always set to ResumeAction::Continue
).
Note that once I tweak the resume API to make single-stepping optional (tracked under #59), the gdbstub
won't even report that it supports single-stepping, which means the GDB client won't even let you invoke stepi
.
I'm looking to hook the kernel print!()
command to use gdbstub::output!()
instead. I've run into two issues:
output!()
macro uses std::fmt
instead of core::fmt
. Is this intentional, or is it easy enough to switch it to use core::fmt
?ConsoleOutput
struct? I'll need to pass this to the first argument of output!()
but I don't see any way to do that, aside from inside handle_monitor_cmd()
. I have something hooked that manually writes to the pipe when it's not halted. Is there another approach I should take?Regarding breakpoint types, I thought GDB would set its own breakpoints by poking into RAM if it detected that a program was in RAM. At least that's what I recall observing when I was working on my own gdb server targeting bare metal: https://github.com/litex-hub/wishbone-utils/blob/master/wishbone-tool/crates/core/gdb.rs
It only supports two BreakPointType::BreakHard
, but I thought I recall that if an address was in RAM gdb would create a breakpoint without sending a Z
packet. Probably that wasn't actually the case and I'm just misremembering.
How do I signal to gdbstub
that it should halt? Is there something I pass to pump()
in order to get it to do that?
How should I fill in set_resume_action
? When does it get called? Is it called when the system halts, or when it resumes? Is there an equivalent halt()
to the existing resume()
?
- The output!() macro uses
std::fmt
instead ofcore::fmt
Yikes, good catch 😱 I just pushed a fix up on the feature branch. That's definitely a oversight on my part. Good catch!
- How do I create a
ConsoleOutput
struct? ... I don't see any way to do that, aside from insidehandle_monitor_cmd()
See #51. That should give you all the context you need.
Then again, on further thought, now that we have this state machine based model, I bet it'd be pretty easy to add a console_write
method to the GdbStubStateMachine::DeferredStopReason
state. Maybe I'll look into adding that...
Given that's the state you'll probably be in most of the time, that'd make it pretty easy to log output via GDB. Of course, you would have to be careful not to log from inside any gdbstub methods, as you would need to get a double-reference to the global gdbstub instance, which will result in some serious badness.
Regarding breakpoint types, I thought GDB would set its own breakpoints by poking into RAM
Read through https://sourceware.org/gdb/onlinedocs/gdb/Remote-Stub.html, particularly the "Stub Contents" and "Bootstrapping" sections. Based on my reading + personal observations, GDB has never tried to change memory contents with the intention of setting a breakpoint. Then again, I've never tried to set a breakpoint without having the breakpoint IDETs implemented, so maybe there's a fallback path somewhere in the GDB client, but I highly doubt it.
If you're feeling adventurous, consider reading through the GDB client source and seeing if you can find any of the logic you're describing. A good starting point might be somewhere in https://github.com/bminor/binutils-gdb/blob/master/gdb/remote.c
How do I signal to gdbstub that it should halt?
AFAIK, the GDB RSP doesn't define a server-to-client disconnect packet. You've basically got two options when you want to end a debugging session:
quit
or detach
in the GDB client). this will result in a DisconnectReason::Disconnect
event, which you can handle appropriately.Target::Error
, and return it whenever you want to stop the debugging session. It won't be a clean disconnect, but hey, it'll work.Please let me know if I've missed some obscure feature of the RSP that enables GDB targets to cleanly end a debugging session on their terms.
How should I fill in
set_resume_action
The docs and examples are there for a reason 😏
set_resume_action
is only ever called once per resume, but hey, some example code is better than none, right?Is there an equivalent
halt()
to the existingresume()
?
I'm not sure what you mean? Are you thinking of something like ExtendedMode::kill
?
See #51. That should give you all the context you need.
Thanks! That does seem better than my current hamfisted approach. I'll track that and integrate it when it's ready.
If you're feeling adventurous, consider reading through the GDB client source and seeing if you can find any of the logic you're describing.
The very end of remote.c
https://github.com/bminor/binutils-gdb/blob/master/gdb/remote.c#L10535-L10593 calls return memory_insert_breakpoint (this, gdbarch, bp_tgt);
under certain circumstances that I don't currently understand.
This hands the call off to gdbarch_memory_insert_breakpoint()
for the given target. On all platforms except m32r
, this calls default_memory_insert_breakpoint()
which is at https://github.com/BeMg/riscv-bintuils/blob/6a07aff56dac4440ae02f9df22aaa1960a1add1e/gdb/mem-break.c#L36-L70
This function figures out what a breakpoint looks like on this platform by calling gdbarch_sw_breakpoint_from_kind()
. On RISC-V this is one of two byte patterns: https://github.com/BeMg/riscv-bintuils/blob/6a07aff56dac4440ae02f9df22aaa1960a1add1e/gdb/riscv-tdep.c#L262-L278 and on ARM it's more complicated because they also have endianness to deal with: https://github.com/BeMg/riscv-bintuils/blob/6a07aff56dac4440ae02f9df22aaa1960a1add1e/gdb/arm-linux-tdep.c#L78-L95
After it has the breakpoint value and size, default_memory_insert_breakpoint()
reads the contents of memory, stashes it in a shadow value, and writes the target's breakpoint value to RAM.
Please let me know if I've missed some obscure feature of the RSP that enables GDB targets to cleanly end a debugging session on their terms.
At this moment, I'm more interested in telling the client that the server has hit a breakpoint. What do I pass to pump()
to tell gdbstub
that I've hit a breakpoint? As-is, there are two ways to go from Run mode to Stop mode: The user hits Control-C, or the target hits a breakpoint for some reason. The former case is handled with pump()
. How do I handle the latter case?
The docs and examples are there for a reason 😏
The example I'm reading has this, which doesn't give much of an idea what it does or when it gets called:
#[inline(never)]
fn set_resume_action(&mut self, _tid: Tid, _action: ResumeAction) -> Result<(), Self::Error> {
print_str("> set_resume_action");
Ok(())
}
I'll try to understand what the function is for by reading the code, but the documentation isn't really helping me to understand it. What is it used for? The docs say "A simple implementation of this method would simply update an internal HashMap<Tid, ResumeAction>.", but why wouldit do that? When is this HashMap
consulted? Is this function called when the target is halted due to a breakpoint? When it's halted due to the user hitting Control-C? When the target is resumed? Once when the host connects? And when is the value consulted? What's the difference between resume()
and set_resume_action()
?
I'm not sure what you mean? Are you thinking of something like ExtendedMode::kill?
The resume()
Trait method gets called when the target is going to be resumed. Is there an equivalent Trait method that will get called when the target is halting? For example, when gdbstub
wants to halt the target due to the user hitting Control-C. Is that indicated by e.g. having the state machine in the GdbStubStateMachine::DeferredStopReason()
state? And what value should I be passing to gdb_state.deferred_stop_reason()
? Right now I seem to be passing ThreadStopReason::DoneStep
which I feel may not be correct, since it may be GdbInterrupt
.
Overall I think I'm still subtly misusing gdbstub
, but it does seem almost all there.
Wow, thanks for digging into the GDB client code!
It seems that memory_insert_breakpoint
is hit when the target doesn't support the Z0
packet (i.e: the SwBreakpoints
IDET). If I'm reading this right, then if you sniff the data being sent to/from gdbstub
after setting a breakpoint, you should be seeing the target set up the breakpoint instructions appropriately...
If that's the case, then you'll need to make sure you've set up your interrupt / exception handlers correctly to intercept the breakpoint.
What do I pass to
pump()
to tell gdbstub that I've hit a breakpoint?
You don't call pump()
if you've hit a breakpoint. If you've hit a breakpoint, then you should be inside the breakpoint exception handler, at which point you wouldn't even have a byte to pass to pump()
. In that case, you'd call deferred_stop_reason
with ThreadStopReason::SwBreakpoint
.
See the big-picture summary at the end of this comment if you're still confused.
I'll try to understand what the function is for by reading the code, but the documentation isn't really helping me to understand it.
Fair enough! In this case, the documentation you actually want to read is resume
's documentation, since it explains the "lifecycle" of resume
, set_resume_action
, and clear_resume_actions
:
Resume execution on the target.
Prior to calling resume, gdbstub will call clear_resume_actions, followed by zero or more calls to set_resume_action, specifying any thread-specific resume actions.
The default_action parameter specifies the “fallback” resume action for any threads that did not have a specific resume action set via set_resume_action. ~The GDB client typically sets this to ResumeAction::Continue, though this is not guaranteed.~ <-- This isn't true, it'll always be set to
ResumeAction::Continue
. I'll be removing this entirely as part of the "support optional single threading" work.
I'll add a pointer to refer to the resume
docs from set_resume_action
and clear_resume_actions
, to make sure folks don't miss it.
As for the example code... there's a reason I explicitly linked you to the armv4t_multicore
example, as opposed to the example_no_std
example. That code sample you linked isn't descriptive because it's example_no_std
doesn't implement an actual target - it's just a barebones framework for testing / validating the gdbstub
runs in no_std
environment.
When you're working on implementing actual functions, you'll want to refer to the significantly more fleshed out armv4t
and armv4t_multicore
examples.
For example, when gdbstub wants to halt the target due to the user hitting Control-C.
Hitting Ctrl-C does not halt the target. The only thing hitting Ctrl-C does is send a "interrupt" packet to the target, which is then processed via pump
, and eventually results in a Event::CtrlCInterrupt
. At that point, you'll want to handle the interrupt request however you need to (e.g: by pausing the current process), and then passing an appropriate stop reason to deferred_stop_reason
.
Please re-read my earlier comment: https://github.com/daniel5151/gdbstub/issues/56#issuecomment-864143003 I edited it adding more context, which you may have missed.
Taking a step back, I think I need to explain what the GdbStubStateMachine::DeferredStopReason
state entails.
In a nutshell, the first time you start up gdbstub
, it will be in the Pump
state, as it hasn't established a connection yet, and it doesn't know the runtime state of the target. At some point, the GDB client will request the run the target, at which point, your implementation will return None
from resume
, indicating that the stub should transition into the DeferredStopReason
state. At this point the GDB stub is waiting for a Stop Reply Packet
deferred_stop_reason
is how you report that stop reply packet.
In your implementation, you'll end up calling deferred_stop_reason
in both your UART handler's GDB loop (i.e: the whole "Ctrl-C stop reason business), and also from the breakpoint exception handler!
Let me know if that clears things up.
With the current design, it seems as though GdbStubError::TargetError(_)
is fatal.
This is due to the fact that a gdb_state_machine
is an enum. If the enum is in the state GdbStubStateMachine::Pump(inner)
, then we must call inner.pump(&mut target, incoming_byte)
. However, inner.pump()
takes self
, and therefore consumes inner
, which moves it out of GdbStubStateMachine
.
To work around this, I basically consume and reconstitute the state machine on every loop.
I.e.
if let Some(gdb_state_machine) = unsafe { GDB_STATE.take() }
{
let new_gdb = match gdb_state_machine {
GdbStubStateMachine::Pump(gdb_state) => match gdb_state.pump(&mut target, b) {
Ok((gdb, None)) => gdb,
Err(e) => {
cleanup();
println!("gdbstub error: {}", e);
return true;
}
GdbStubStateMachine::DeferredStopReason(gdb_state) => {
match gdb_state.deferred_stop_reason(&mut target, ThreadStopReason::DoneStep) {
Ok((gdb, None)) => gdb,
Err(e) => {
cleanup();
println!("deferred_stop_reason_error: {:?}", e);
return true;
}
}
}
};
unsafe { GDB_STATE = Some(new_gdb) };
}
The problem is that the happy path returns (GdbStubStateMachineInner, $something)
, but the error path simply returns Error<T>
. So whenever an error hits, the state machine's Inner
is consumed and I can no longer replace GDB_STATE
.
To work around this, I basically consume and reconstitute the state machine on every loop.
Yes, that is the intended way to work with the API.
The problem is that the happy path returns
(GdbStubStateMachineInner, $something)
, but the error path simply returnsError<T>
. So whenever an error hits, the state machine'sInner
is consumed and I can no longer replaceGDB_STATE
.
Ah, indeed, good catch. I should tweak the API to return a (GdbStubStateMachine, Result<..>)
, instead of a Result<(GdbStubStateMachine, ..), ..>
. It ought to be possible to report a TargetError
, but then re-enter the GDB debugging loop to perform "post mortem" debugging. I'll probably push a fix up sometime tomorrow / over the weekend.
...so remember when you asked "How do I signal to gdbstub that it should halt" and I said:
AFAIK, the GDB RSP doesn't define a server-to-client disconnect packet.
yeah, that was pre-morning-coffee me talking.
The way to signal a "halt" is to pass ThreadStopReason::Terminated
to deferred_stop_reason
, and then perform whatever cleanup logic you need to when handling the resulting DisconnectReason::TargetTerminated
.
Theoretically, I can add a new signal_terminate()
method that provides a shortcut to this logic.
Reporting ThreadStopReason::Terminated
will inform the GDB client that the target has "halted", which should close the debugging session on the client side.
Hahahaha, I just tested it, and GDB totally rewrites memory to insert a breakpoint if I disable the Breakpoints
IDET in the armvt4
example!
Time to update my docs...
Heads up, I've merged the feature/state-machine-recv-packet
branch into dev/0.6
, and will be deleting the feature branch shortly. All subsequent development will be happening on the dev/0.6
branch, so make sure to switch over!
Sorry for hijacking this, but I'm basing my implementation directly on the dev/0.6 branch because of this issue here and I have some basic question regarding the GdbStubStateMachine
and GdbStub
:
When should I have just a GdbStub
and at what point during the lifetime of a program/debug session am I supposed to call run_state_machine()
to get to a GdbStubStateMachine
? It seems as if as soon as I get a GdbStub I can immediately call run_state_machine
and forget about GdbStub
, but then I don't understand why I'd need to keep a GdbStub
around ever in the first place?
In your code-examples the state machine struct is either on the stack or lives in some static Option<>
type. On the stack is rarely feasible, and a static Option<>
type is a bit unsafe, especially in a multi-core scenario. However, sticking it in say a spin::Mutex
seems difficult to me given the API (needs self
instead of &mut self
in several places). Is the idea here to always have an Option<>
or Mutex<Option<>>
and take it out and put it back every time you interact with the STM, or am I missing something obvious?
Hey @gz, no need to apologize, these are perfectly relevant questions :)
Plus, it serves as yet another reminder that I really need to find some time to polish up and push out 0.6, since you're far from the only one who's currently working directly off the dev/0.6
branch 😅
- It seems as if as soon as I get a
GdbStub
I can immediately callrun_state_machine
and forget aboutGdbStub
Yep, that's exactly right.
When you spell it out like this, I realize that it does seem a bit silly to have this be a two-step operation... I think a better long-term approach would be to update GdbStubBuilder
to have a build_state_machine
method that bypasses this "transient" GdbStub
.
- Is the idea here to always have an
Option<>
orMutex<Option<>>
and take it out and put it back every time you interact with the STM, or am I missing something obvious?
With the current API, yes, that would indeed be how you'd have to use it. It's been a while since I hacked on this code, but IIRC I couldn't find a good way to implement the typestate state machine API without requiring the implementer to play "hot potato" with the various states.
Consequently, if you've got a suggestion on how you might tweak the API to maintain type-system enforced sequencing/correctness while using &mut self
, I would be more than happy to tweak the current approach.
Also, if you're at liberty to share more details, I'm very interested to hear more about what context you're using gdbstub
in.
Hi @daniel5151 thanks for the clarification! I hacked a bit more on this yesterday and I think I was able to wrap my head around the typed state-machine. IMO the organization you have made a lot of sense once I grasped it.
Re sharing my use-case: Sure, it's all open-source, I'm currently adding gdb remote debugging support for an experimental x86-64 multicore kernel here: https://github.com/vmware-labs/node-replicated-kernel/pull/52 (still in pretty rough shape but hopefully I get it in something more polished within the next few weeks :))
Ahh, very cool! Thanks for sharing!
If you have any more questions / suggestions about the state machine API (or any other gdbstub
API for that matter), please let me know.
Actually, I do have a more general question where you might be able to know how to best express this with gdbstub. The kernel binary in my case is relocated at a random address (e.g., changes every time the system boots, similar to Linux with KASLR). GDB is usually confused with this as it looks for the symbols at the offsets given in the ELF file and so one needs to go and manually set this offset when loading the binary (e.g., using the symbol-file command).
I was trying to find if I can somehow send the symbol-file
command or something equivalent from the kernel directly to gdb with the right offset information. Or, if this isn't possible (which is what I suspect) what would some other good way be to pass this offset back to gdb in an automated fashion (combination of custom monitor command and .gdbinit scripting etc.?).
PS: And thanks btw for all the hard and amazing work on gdbstub, it definitely saved me a ton of work if I had to implement all this by myself!
Would SectionOffsets
be what you're looking for?
https://docs.rs/gdbstub/0.5.0/gdbstub/target/ext/section_offsets/index.html
Also check out #20, which might be related.
dev/0.6
also includes the MemoryMap
IDET, which could also be useful.
As a meta note: gdbstub
implements protocol extensions on an as-needed basis, so if you find that gdbstub
is missing a particular protocol feature, you can check if the feature is implemented at the protocol level, and if it is, it should be easy to open a PR and add it to gdbstub
:)
https://sourceware.org/gdb/onlinedocs/gdb/General-Query-Packets.html#General-Query-Packets
Just wanted to report section offsets works great, thanks a lot for the help!
To give a quick update regarding this feature:
I'm pretty happy with the current state of the state machine API, and once @gz has a chance to integrate the latest changes, I'll try to find some time to properly document everything + push out a proper 0.6 release.
@xobs, not sure if you're still tracking this, but if you get the chance, it would be super useful if you could update your implementation as well, and let me know if there are any issues / feature gaps.
Cheers!
I believe that the time has finally come close out this issue.
The goal of this issue was to add an bare-metal API to gdbstub
, and as far as I can tell, the current iteration of the state machine API has indeed done that.
If you encounter any issues with this API down the line, please open a new issue instead of re-opening / commenting on this one.
gdbstub
makes it very easy to add a GDB server to any project. The current design uses long-polling to interact with the target. Conceptually,gdbstub
sits between a networkConnection
and theTarget
being debugged. When the target is running,gdbstub
is blocked.An API for bare metal could live in an interrupt handler such as a UART. Each time a character is received, it would be passed to
gdbstub
for collection and processing. This has several nice properties:gdbstub
means no debugger, and you simply have to remove the interrupt hook.