Closed gz closed 3 years ago
Yeah... that's not good.
Could you post some more context on how you arrived at this situation? i.e: the debug logs from gdbstub
and the GDB client?
Ahh, sorry - could you re-upload the gdbstub trace log with the trace-pkt
feature enabled?
Also, could you link to the specific revision of your gdbstub
implementation that exhibits this behavior?
I'm on the refactor/blocking-event-loop
currently, will check if this also happens with latest dev/0.6 in a second
Same issue with dev/0.6
(I guess this was at the same revision anyways). Maybe I'm also just doing something stupid. My approach is to wait for the serial line receive interrupt and if I get that I pass in the ThreadStopReason::GdbCtrlCInterrupt
to the state machine. Is this what I'm supposed to do?
Before I look into it this more (likely sometime this weekend), do you know if this was something that started happening after I merged #88, or was this happening before as well?
Theoretically, this assert is there because the GDB client should never send an interrupt packet while the GDB stub is still in the "paused" state (i.e: the Pump
state, where it is waiting for more data while the target isn't running). The fact that this is happening means that there's either an issue with how the state machine API is written (i.e: there's a missing state transition), or there's something spooky going on in your implementation (which I would appreciate a commit-hash'd link to)
Here is the start of my state machine:
pretty much all the gdb logic stuff is in that file
I didn't have Ctrl+C
working before #88 but I can easily revert the changes for that branch and try it, let me do that.
Alright, I've skimmed through your implementation, and I think I understand the issue in your implementation.
Before going into the specific issue at hand, lets back up for a second and discuss how the overall state machine flow goes:
(1) pump(u8)
┌─────────────────────────┐
│ │
start ┌─────┴──┐ ┌────▼─────────────────┐
──────────► Pump │ │ DeferredStopReason │
└─────▲──┘ └────┬───┬──────────▲──┘
│ │ │ │
└─────────────────────────┘ └──────────┘
deferred_stop_reason(..) (2) pump(u8)
The Pump
state represents the state where the Target is entirely stopped, and gdbstub
is waiting for data (e.g: "read these registers", "write to this memory address", etc...).
Once the gdbstub
receives a resume packet, it transitions into the DeferredStopReason
state, whereupon the Target is running concurrently with gdbstub
.
In this state, there are two events that need to be handled (hence the two methods onGdbStubStateMachineInner<DeferredStopReason>
: the case where the Target reports a stop reason (e.g: a breakpoint is hit), and the case where the GDB client sends more data down the line.
The first case is easy to understand: call deferred_stop_reason
with the appropriate stop reason, and then the stub will transition back into Pump
, awaiting more instructions.
The second case seems a bit weird... why would the GDB client send data while the target is running?
Well... as it turns out, Ctrl-C interrupts are signaled by sending special "interrupt" packets down the wire! Specifically, an interrupt can be signaled either by sending a single 0x3
byte down the wire, or in some cases, by sending a properly framed vCtrlC
packet.
See https://sourceware.org/gdb/onlinedocs/gdb/Interrupts.html#Interrupts for details.
In your implementation, you are not handling the "data is sent down the wire while the target is running" case at all. Instead, your current implementation seems to do the following:
KCoreStopReason::ConnectionInterrupt => Some(ThreadStopReason::GdbCtrlCInterrupt)
- i.e: if you sense new data on the wire, without even checking what it is, report a stop reason.deferred_stop_reason
dutifully reports the stop reason back to the client, and transitions back into the Pump
state to await new dataPump
state, it finally reads the data that was sent down the wire, only to find a interrupt packet that should have been handled by the pump
flow in the DeferredStopReason
state! An assertion fires, and everything dies.i.e: you should only call deferred_stop_reason
after the .pump()
method in the DeferredStopReason
returns a Event::CtrlCInterrupt
.
A good starting point for how to structure your event loop would be to look at the guts of the new run_blocking
method, specifically, the contents of the GdbStubStateMachine::DeferredStopReason
match arm:
https://github.com/daniel5151/gdbstub/blob/4e46b72/src/gdbstub_impl/mod.rs#L199
After writing this explanation, I realize that the fact that you ran into this issue is indicative that this new API still needs some work + polish.
I suspect that the API itself could be improved (possibly by adding some more fine-grained states), and it goes without saying that the biggest missing piece right now is good documentation/examples...
Ideally, I want to craft an API that's impossible to misuse - leveraging as many Rust type-system shenanigans to have the compiler enforce all branches are being accounted for. I'll have to think about how I can tweak the current iteration of the API to achieve that...
Ah, in addition, I really should remove the ThreadStopReason::GdbCtrlCInterrupt
entirely. It's literally just a "short hand" for writing ThreadStopReason::DoneStep
, but I suspect that its naming will lead to lots of confusion...
I'll go ahead and do immediately.
Ah I see, yes I figured the chances are high it's just me doing something wrong here. This detailed explanation helps a lot. If you'd like me to I could try to put this into the docs (only if you don't want to do it yourself of course).
I appreciate the offer, but I think this is something I should tackle myself - especially since there's a non-zero chance I rework the API a bit, which would invalidate any docs.
And of course, when you get the chance, please let me know if my explanation + suggested fix was correct!
Great, so thanks a lot for your help, I rewrote my state-machine a little to reflect the fact that it should pump() in DeferredStopReason when I have data available. I hope this accurately captures what you wanted me to do.
With this change, Ctrl+C
does work fine together with continue
(tested with an infinite loop):
nrk::arch::_start (argc=70369785565184, _argv=0x9)
at kernel/src/arch/x86_64/mod.rs:927
927 let _r = xmain();
(gdb) continue
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
nrk::xmain () at kernel/src/integration_main.rs:512
512 while cond {}
(gdb) continue
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
nrk::xmain () at kernel/src/integration_main.rs:512
512 while cond {}
(gdb) continue
Continuing.
^C
Program received signal SIGTRAP, Trace/breakpoint trap.
nrk::xmain () at kernel/src/integration_main.rs:512
512 while cond {}
(gdb) next
^CRemote communication error. Target disconnected.: Connection reset by peer.
So while continue
works fine, doing a next
or step
and then hitting Ctrl+C again still gives me gdbstub::gdbstub_impl::state_machine: Unexpected interrupt packet!
(and the connection reset above as a cascading error).
I think I understand what's going on: when I do next
, the program will start to single-step through the loop so it will execute the state-machine logic many times. What's weird then is that the state-machine does end up in the Pump state (as indicated by the pumping...
log message below. Am I correct that this shouldn't happen since GDB is just single-stepping and I don't actually have a prompt (in gdb)?
e.g. the way the log looks it seems for one state-machine execution after taking a step it ends up many times in Pump state without me ever having a GDB prompt, so unsurprisingly it gets an unexpected interrupt packet when I hit Ctrl+c during this time:
[ .. beginning of one gdb stm logic when we got interrupted from after taking a single-step ..]
26533530728 [INFO ] - nrk::arch::gdb: event_loop reason DebugInterrupt
[.. we determine that we stopped because we're stepping ..]
26537096730 [TRACE] - nrk::arch::gdb: stop reason is DoneStep
26540296784 [INFO ] - nrk::arch::gdb: stop_reason Some(DoneStep)
[ .. the stm was in deferredstopreason state, this is good too ..]
26543815968 [INFO ] - nrk::arch::gdb: deferred_stop_reason
26547158636 [TRACE] - gdbstub::protocol::response_writer: --> $S05#b8
[ .. transitions into pump state here, ok but I don't have a gdb prompt yet ..]
26550949190 [INFO ] - nrk::arch::gdb: pumping...
26554247238 [INFO ] - nrk::arch::gdb: pumping...
26557388140 [INFO ] - nrk::arch::gdb: pumping...
26560478940 [INFO ] - nrk::arch::gdb: pumping...
26563497220 [INFO ] - nrk::arch::gdb: pumping...
26566565346 [TRACE] - gdbstub::protocol::recv_packet: <-- $g#67
[ .. does some stuff this is all fine ..]
26570024202 [TRACE] - nrk::arch::gdb: read_registers X86_64CoreRegs { regs: [40003D4B
FC4F, 00, 40003D41D600, 01, 00, 40003D4BFC4F, 40003E354FF0, 40003E354D70, 00, 00, 400
03E3544F8, 8C, 01, 01, 04, 3F2005D4], eflags: 302, rip: 40003CFB5CB4, segments: X86Se
gmentRegs { cs: 08, ss: 10, ds: 00, es: 00, fs: 00, gs: 00 }, st: [[00, 00, 00, 00, 0
0, 00, 00, 00, 00, 00], [00, 00, 00, 00, 00, 00, 00, 00, 00, 00], [00, 00, 00, 00, 00
, 00, 00, 00, 00, 00], [00, 00, 00, 00, 00, 00, 00, 00, 00, 00], [00, 00, 00, 00, 00,
00, 00, 00, 00, 00], [00, 00, 00, 00, 00, 00, 00, 00, 00, 00], [00, 00, 00, 00, 00,
00, 00, 00, 00, 00], [00, 00, 00, 00, 00, 00, 00, 00, 00, 00]], fpu: X87FpuInternalRe
gs { fctrl: 37F, fstat: 00, ftag: 00, fiseg: 00, fioff: 00, foseg: 00, fooff: 00, fop
: 00 }, xmm: [1000040003D49BC88, 20000000000000000000000000, 40003D41D860, C0C0, 8080
, 10000000000000001, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00], mxcsr: 1FA0 }
26609124980 [TRACE] - gdbstub::protocol::response_writer: --> $4ffc4b3d00400000000000
000000000000d6413d00400000010000000000000000000000000000004ffc4b3d00400000f04f353e004
00000704d353e0040000000000000000000000000000000000000f844353e004000008c00000000000000
010000000000000001000000000000000400000000000000d405203f00000000b45cfb3c0040000002030
0000800000010000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000007f030000000000000000000000000000000000000000
0000000000000000000088bc493d004000000100000000000000000000000000000000000000200000006
0d8413d004000000000000000000000c0c000000000000000000000000000008080000000000000000000
0000000000010000000000000001000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000a01f0000xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx#d0
[ .. lots of pumping, if I hit Ctrl+C here ever I'm in trouble ..]
26646637356 [INFO ] - nrk::arch::gdb: pumping...
26691508732 [INFO ] - nrk::arch::gdb: pumping...
26694696538 [INFO ] - nrk::arch::gdb: pumping...
26697971936 [INFO ] - nrk::arch::gdb: pumping...
26701309008 [INFO ] - nrk::arch::gdb: pumping...
26704563356 [INFO ] - nrk::arch::gdb: pumping...
26708698674 [INFO ] - nrk::arch::gdb: pumping...
26712805744 [INFO ] - nrk::arch::gdb: pumping...
[.. skipped some ..]
26899875856 [INFO ] - nrk::arch::gdb: pumping...
26903197970 [INFO ] - nrk::arch::gdb: pumping...
26906359538 [INFO ] - nrk::arch::gdb: pumping...
26909491578 [INFO ] - nrk::arch::gdb: pumping...
26912674862 [INFO ] - nrk::arch::gdb: pumping...
26915956958 [INFO ] - nrk::arch::gdb: pumping...
26919280244 [INFO ] - nrk::arch::gdb: pumping...
26922500174 [TRACE] - gdbstub::protocol::recv_packet: <-- $vCont;s:p1.1;c:p1.-1#f7
26926665256 [INFO ] - nrk::arch::gdb: set resume_with = Step
26930201850 [INFO ] - nrk::arch::gdb: before resuming...
[ .. ok we're done with one gdb stm loop for single-step, we determine we should resume with single step ..]
26933414652 [TRACE] - nrk::arch::gdb: Step execution, set TF flag.
[ .. things repeat, the step has completed we end up figuring out we did a step and run the gdb stm again ..]
26936998000 [INFO ] - nrk::arch::gdb: event_loop reason DebugInterrupt
26940745424 [TRACE] - nrk::arch::gdb: stop reason is DoneStep
26944102380 [INFO ] - nrk::arch::gdb: stop_reason Some(DoneStep)
26947737106 [INFO ] - nrk::arch::gdb: deferred_stop_reason
26951167928 [TRACE] - gdbstub::protocol::response_writer: --> $S05#b8
26954947042 [INFO ] - nrk::arch::gdb: pumping...
27042501250 [INFO ] - nrk::arch::gdb: pumping...
27045500900 [INFO ] - nrk::arch::gdb: pumping...
27048638710 [INFO ] - nrk::arch::gdb: pumping...
27051728572 [INFO ] - nrk::arch::gdb: pumping...
27054819982 [TRACE] - gdbstub::protocol::recv_packet: <-- $g#67
27058266062 [TRACE] - nrk::arch::gdb: read_registers X86_64CoreRegs { regs: [40003D4B
FC4F, 00, 40003D41D600, 01, 00, 40003D4BFC4F, 40003E354FF0, 40003E354D70, 00, 00, 400
03E3544F8, 8C, 01, 01, 04, 3F2005D4], eflags: 302, rip: 40003CFB5CBB, segments: X86Se
gmentRegs { cs: 08, ss: 10, ds: 00, es: 00, fs: 00, gs: 00 }, st: [[00, 00, 00, 00, 0
0, 00, 00, 00, 00, 00], [00, 00, 00, 00, 00, 00, 00, 00, 00, 00], [00, 00, 00, 00, 00
, 00, 00, 00, 00, 00], [00, 00, 00, 00, 00, 00, 00, 00, 00, 00], [00, 00, 00, 00, 00,
00, 00, 00, 00, 00], [00, 00, 00, 00, 00, 00, 00, 00, 00, 00], [00, 00, 00, 00, 00,
00, 00, 00, 00, 00], [00, 00, 00, 00, 00, 00, 00, 00, 00, 00]], fpu: X87FpuInternalRe
gs { fctrl: 37F, fstat: 00, ftag: 00, fiseg: 00, fioff: 00, foseg: 00, fooff: 00, fop
: 00 }, xmm: [1000040003D49BC88, 20000000000000000000000000, 40003D41D860, C0C0, 8080
, 10000000000000001, 00, 00, 00, 00, 00, 00, 00, 00, 00, 00], mxcsr: 1FA0 }
27098559170 [TRACE] - gdbstub::protocol::response_writer: --> $4ffc4b3d00400000000000
000000000000d6413d00400000010000000000000000000000000000004ffc4b3d00400000f04f353e004
00000704d353e0040000000000000000000000000000000000000f844353e004000008c00000000000000
010000000000000001000000000000000400000000000000d405203f00000000bb5cfb3c0040000002030
0000800000010000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000
000000000000000000000000000000000000000007f030000000000000000000000000000000000000000
0000000000000000000088bc493d004000000100000000000000000000000000000000000000200000006
0d8413d004000000000000000000000c0c000000000000000000000000000008080000000000000000000
00000000000100000000000000010000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000000000000000000000000000000000000000000000000000000000000000000
0000000000000000000000a01f0000xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx#fe
[ .. again i'm going in the pump state, not clear why ..]
27136006782 [INFO ] - nrk::arch::gdb: pumping...
[.. here I hit Ctrl+C in GDB (I didn't have a prompt) .. ]
27139247866 [TRACE] - gdbstub::protocol::recv_packet: <--
27142773072 [DEBUG] - gdbstub::gdbstub_impl: <-- interrupt packet
[ .. gdbstub (rightfully?) complains because it was in pump state? ..]
27146204800 [ERROR] - gdbstub::gdbstub_impl::state_machine: Unexpected interrupt pack
et!
27150376962 [ERROR] - nrk::arch::gdb: gdbstub internal error PacketUnexpected
Hmm, I'm not entirely sure what's going on here just yet, but here are a few things you can try:
deferred_stop_reason
from the Event::CtrlCInterrupt
branch.gdbstub
again, and modify https://github.com/daniel5151/gdbstub/blob/dev/0.6/src/gdbstub_impl/ext/base.rs#L301 to return just vCont;c;C
Did some tests:
Re 1: Changed it but didn't make a difference (still seems like it gets a ctrl+c packet when not expecting it)
Re 2: Using the new feature branch (feature/optional-single-step
): Didn't resolve the issue, I noticed that disabling stepping breaks my next
and step
commands a bit more than it is already :) (meaning I can't even get to point where I would issue a ctrl+c to stop my infinite loop).
With stepping is enabled, the step
, next
commands works fine (like before I switched to the branch) when I'm not in a loop (but it will end up with an unexpected packet error when I'm sending a ctrl+c after a next
in an infinite loop):
nrk::arch::_start (argc=70369785565184, _argv=0x9)
at kernel/src/arch/x86_64/mod.rs:927
927 let _r = xmain();
(gdb) step
nrk::xmain () at kernel/src/integration_main.rs:499
499 arch::irq::ioapic_establish_route(0x0, 0x0);
(gdb) step
nrk::arch::irq::ioapic_establish_route (_gsi=0, _core=0)
at kernel/src/arch/x86_64/irq.rs:810
810 for io_apic in atopology::MACHINE_TOPOLOGY.io_apics() {
(gdb) step
<atopology::MACHINE_TOPOLOGY as core::ops::deref::Deref>::deref (self=0x40003d409b38)
at /home/gz/.cargo/registry/src/github.com-1ecc6299db9ec823/lazy_static-1.4.0/src/lib.rs:144
144 __stability()
(gdb) step
<atopology::MACHINE_TOPOLOGY as core::ops::deref::Deref>::deref::__stability ()
at /home/gz/.cargo/registry/src/github.com-1ecc6299db9ec823/lazy_static-1.4.0/src/lib.rs:144
144 __stability()
(gdb) step
lazy_static::lazy::Lazy<T>::get (
self=0x40003d4bfb08 <<atopology::MACHINE_TOPOLOGY as core::ops::deref::Deref>::deref::__stability::LAZY>, builder=<optimized out>)
at /home/gz/.cargo/registry/src/github.com-1ecc6299db9ec823/lazy_static-1.4.0/src/core_lazy.rs:21
21 self.0.call_once(builder)
(gdb) step
If I disable stepping (fn support_single_step
returns None), I'm getting an unexpected packet error immediately after sending a next
in gdb (similar problem to the one above but this time it's not a ctrl+c packet but rather a vCont package). Now of course I'm not sure what it means for next
and step
if single stepping is disabled but I figured it wouldn't matter (as long as I don't do stepi or something) since it could just set a breakpoint?
nrk::arch::_start (argc=70369785565184, _argv=0x9)
at kernel/src/arch/x86_64/mod.rs:927
927 let _r = xmain();
(gdb) step
Remote connection closed
(gdb) q
10122983318 [INFO ] - nrk::arch::gdb: pumping...
10126255784 [INFO ] - nrk::arch::gdb: pumping...
10129504230 [INFO ] - nrk::arch::gdb: pumping...
10132724428 [TRACE] - gdbstub::protocol::recv_packet: <-- $vCont;s:p1.1;c:p1.-1#f7
10136879558 [ERROR] - nrk::arch::gdb: gdbstub internal error PacketUnexpected
FWIW it seems to me that the root of the issue is that the state machine unconditionally goes from deferred_stop_reason into pump state. However, I think in my case, gdb seems to think it's best to just execute the next
command by single-stepping through the program stopping after every instruction (without giving me a prompt or anything) -- so judging from this code hitting ctrl+c while in the pump state will be inevitable.
I'm maybe understanding this wrong but so far I assumed the STM is only supposed to be in the pump state when I have a gdb prompt on the client side?
I noticed that disabling stepping breaks my
next
andstep
commands a bit more than it is already
That could be indicative of an issue in how you're handling your breakpoints / other stop reasons... Are you adjusting the PC after the breakpoint has been hit?
GDB will do some very weird stuff if breakpoints aren't implemented correctly, and that isn't something gdbstub
can detect / catch.
If it's not that... I can try to take a closer look sometime this weekend. In the meantime, you should consider doing some hands-on hacking with the current gdbstub
state machine implementation, and see if there might be some incorrect state transitions going on.
As mentioned earlier, this is relatively untested code!
That could be indicative of an issue in how you're handling your breakpoints / other stop reasons... Are you adjusting the PC after the breakpoint has been hit?
So I do/did worry about this, because I saw this piece in the docs and it was not something I'm doing. However, the breakpoints and single-step seemed to be working fine without it (except for when it's in this infinite loop :)), so I looked at two other implementations:
In both examples if they use RFlags to single-step I couldn't find any adjustment they do to the RIP when single-stepping. Also in the iPXE code, they do use hardware debug registers for breakpoints (like I do) and don't seem to adjust the RIP after getting a hardware breakpoint. Now, I tried to check the SDM too to get some conclusive information on what RIP is after a hardware debug breakpoint, but unfortunately I didn't find a conclusive statement (yet).
I did add some asserts to verify when I get a debug interrupt due to hitting a breakpoint (with the dr
registers), that the RIP corresponds to the same address that I programmed in the dr
register and doesn't point to the next instruction, and this seems to be true so far.
So, in the end I concluded that maybe this is only relevant for int 3
type interrupts that get inserted into the code?
So, in the end I concluded that maybe this is only relevant for
int 3
type interrupts that get inserted into the code?
While I can't give you a definitive yes/no answer for that, based on the docs you've linked, that seems correct?
That said, the fact that you code completely breaks when single-stepping support isn't available is not a good sign (i.e: regardless if ctrl-c interrupts are being handled or not). Hardware-assisted single stepping is an optional feature, and if it isn't available, GDB should be using continue + temporary breakpoints to get things working.
What if you roll-back the code to a point before you started trying to support ctrl-c interrupts, and make super sure that the simple case of disabling single stepping + resuming via continue & temporary breakpoints works as expected.
And for having single-stepping disabled. I haven't shared a complete log for that failure yet so here is one:
Wait a sec, that's very weird... why is your GDB client sending $vCont;s:p1.1;c:p1.-1#f7
?
The stub responded to vCont?
with just vCont;c;C
, so the client should not be sending the stub a vCont;s
packet!
Assuming those logs were from the feature/optional-single-step branch, this particular PacketUnexpected should be happening in a totally different part of the codebase, namely: https://github.com/daniel5151/gdbstub/blob/ba4b971f4ff562fe2a70d281f42503f251ed040a/src/gdbstub_impl/ext/base.rs#L510
Could you add a log statement to gdbstub
to see if my guess is correct?
Yes this is using the feature/optional-single-step
branch, and looks like your guess is correct:
5660085612 [TRACE] - gdbstub::protocol::response_writer: --> $vCont;c;C#265674929254
[TRACE] - gdbstub::protocol::recv_packet: <-- $vCont;s:p1.1;c:p1.-1#f7
5685461636 [INFO ] - gdbstub::gdbstub_impl::ext::base: we are at base.rs:510
5695651002 [ERROR] - nrk::arch::gdb: gdbstub internal error PacketUnexpected
What GDB client version are you using? gdb --version
gdb --version
GNU gdb (Ubuntu 9.2-0ubuntu1~20.04) 9.2
Right, that's the same version I'm on...
What happens if instead of next
, you explicitly set a (software) breakpoint + run continue
?
That works fine (break function
, continue
) (note that software breakpoint for me is programming dr registers on x86 -- while still using gdbstub::target::ext::breakpoints::SwBreakpoint)
Trace for that interaction:
Well, aside from the possibly weird interaction of returning a HwBreak
stop reason from a breakpoint set via z0
/Z0
, good to see that it seems to be working as expected...
I quickly checked to see what happens if you disable single-step support (and just in case, range-step support) in the armv4t
example, and the logs seem totally normal (i.e: I report vCont;c;C
, and the client respects that)...
I'll be honest, I am now very confused. Your GDB client clearly acknowledged the vCont?
response, but then it went ahead and sent s
anyways...
Off the top of my head, a few more things to try:
gdb
and/or compile it from source, and see if the behavior is still happening?vCont;s
if you explicitly call stepi
(vs. next
)? Well, aside from the possibly weird interaction of returning a HwBreak stop reason from a breakpoint set via z0/Z0, good to see that it seems to be working as expected...
Right, I noticed that too that I should probably respond with SwBreak. This is a problem with me forwarding this just to the HwBreak logic, will fix it. In theory it shouldn't matter for the vCont issue as this won't even set a BP. But I'll fix it now and report back...
Ok I fixed reporting a HwBreak instead of SwBreak when appropriate.
Does the client still sent vCont;s if you explicitly call stepi (vs. next)?
I did some tests, it doesn't seem to matter what command I use:
(support_single_step set to None
in all cases)
nexti
:(gdb) nexti
Sending packet: $vCont?#49...Packet received: vCont;c;C
Packet vCont (verbose-resume) is supported
Sending packet: $vCont;s:p1.1;c:p1.-1#f7...target_close ()
14103070338 [TRACE] - gdbstub::protocol::recv_packet: <-- $vCont?#49
14107316118 [TRACE] - gdbstub::protocol::response_writer: --> $vCont;c;C#26
14114603290 [TRACE] - gdbstub::protocol::recv_packet: <-- $vCont;s:p1.1;c:p1.-1#f7
14118713350 [ERROR] - nrk::arch::gdb: gdbstub internal error PacketUnexpected
stepi
(gdb) stepi
Sending packet: $vCont?#49...Packet received: vCont;c;C
Packet vCont (verbose-resume) is supported
Sending packet: $vCont;s:p1.1;c:p1.-1#f7...target_close ()
Remote connection closed
step
(gdb) step
Sending packet: $vCont?#49...Packet received: vCont;c;C
Packet vCont (verbose-resume) is supported
Sending packet: $vCont;s:p1.1;c:p1.-1#f7...target_close ()
Remote connection closed
7030775732 [TRACE] - gdbstub::protocol::response_writer: --> $#00
8871060246 [TRACE] - gdbstub::protocol::recv_packet: <-- $vCont?#49
8875015708 [TRACE] - gdbstub::protocol::response_writer: --> $vCont;c;C#26
8882109736 [TRACE] - gdbstub::protocol::recv_packet: <-- $vCont;s:p1.1;c:p1.-1#f78886110586 [ERROR] - nrk::arch::gdb: gdbstub internal error PacketUnexpected
Tried with latest gdb version 11.1 compiled from sources, same result:
85393326932 [TRACE] - gdbstub::protocol::recv_packet: <-- $vCont?#49
85398723430 [TRACE] - gdbstub::protocol::response_writer: --> $vCont;c;C#26
85407086552 [TRACE] - gdbstub::protocol::recv_packet: <-- $vCont;s:p1.1;c:p1.-1#f7
85412503838 [ERROR] - nrk::arch::gdb: gdbstub internal error PacketUnexpected
➜ gdb-11.1 ./gdb/gdb ../node-replicated-kernel/target/x86_64-uefi/debug/esp/kernel
Python Exception <type 'exceptions.ImportError'>: No module named gdb./gdb/gdb: warning: Could not load the Python gdb module from `/usr/local/share/gdb/python'.
Limited Python support is available from the _gdb module.Suggest passing --data-directory=/path/to/gdb/data-directory.
GNU gdb (GDB) 11.1Copyright (C) 2021 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ../node-replicated-kernel/target/x86_64-uefi/debug/esp/kernel...
(gdb) target remote localhost:1234
Remote debugging using localhost:1234
warning: remote target does not support file transfer, attempting to access files from local filesystem.
warning: Unable to find dynamic linker breakpoint function.GDB will be unable to debug shared library initializers
and track explicitly loaded dynamic code.
Python Exception <type 'exceptions.NameError'>: Installation error: gdb._execute_unwinders function is missing
nrk::arch::_start (argc=70369785565184, _argv=0x9) at kernel/src/arch/x86_64/mod.rs:927927
let _r = xmain();
(gdb) next
Remote connection closed
Fascinating...
Okay, another thought:
.gdbinit
?.gdbinit
script, or passing the binary to gdb
(i.e: just opening gdb, and running target remote <foo>
). you shouldn't have any symbols loaded, but I'd be interested to see the behaviour of stepi
and nexti
in that case can you post the contents of your .gdbinit?
It's just target remote localhost:1234
what happens if you connect to the target without using your .gdbinit script, or passing the binary to gdb
good question, I tried without passing a binary and/or gdbinit using nexti
and stepi
, but the error was still the same
Right. Well shoot, I'm stumped.
Unless there's something incredibly obvious I'm missing, it seems that GDB is ignoring the fact that the stub doesn't support single-stepping, and is sending it single-step commands regardless.
I'll try to find some time this weekend to whip up some kind of "dummy" gdbstub
implementation that uses the same Arch
as you're currently using, and see if I can repro the behavior locally. Maybe this is some weird x86 specific issue, whereby GDB will unconditionally assume single-stepping is supported at the architecture level? If that's the case, then I'll have to rethink some of the changes I'm making in #92, in order to avoid try and catch what might be a absolutely terrible footgun.
In the meantime, I'll suggest that we shelve this digression into why optional single-stepping isn't working, and re-focus on the actual issue at hand: getting Ctrl-C interrupts working.
Go ahead and re-enable support for single stepping support, and then see if you can get your state machine implementation working correctly without handling Ctrl-C interrupts. That should give us a baseline level of "working code" to then re-introduce Ctrl-C functionality into.
Well I'll be damned.
I hacked together a dummy gdbstub
implementation backed by the x86 arch (i.e: memory is entirely filled with NOPs, registers are set to reasonable dummy values), and lo and behold: invoking stepi
will unconditionally send s
to the stub!
I'm now digging through the GDB client code to figure out what the heck this is happening...
Indeed, looks more and more like a gdb bug :)
In the meantime, I'll suggest that we shelve this digression into why optional single-stepping isn't working, and re-focus on the actual issue at hand: getting Ctrl-C interrupts working.
Right so coming back on this: I still believe there might be an issue with the assumptions in the state-machine; basically it seems what happens is that when I use next
to start executing an infinite loop GDB will use the "Step mode" to execute said loop (e.g., stopping after every instruction). This means that the STM will go into the pump state many times without me ever having a gdb prompt, so it's likely that the ctrl+c comes in at an inconvenient time (during the pump state).
Maybe now that you have a dummy x86 stub we could use it to confirm this problem in another implementation?
Indeed, it seems you're right!
I'm not sure why I didn't try this out sooner, but if I spin up the in-repo armv4t
example and execute stepi 10000
in GDB, hitting Ctrl-C triggers the same unexpected packet branch as you're seeing!
Now that I have an easy local repro, I can really dig into what's going on, though unfortunately, this week is a bit hectic for me, so I'm not sure when exactly I'll get the chance to dig into this issue...
In any case, I can confirm this is definitely an issue on gdbstub
's end.
If you'd like to take a crack at fixing this issue, I'd be happy to review any code sent my way. Otherwise, give me some time, and I'll definitely fix this issue 😅
Sure sounds good, I can take a look this week and see if I can fix it before you
Just a heads up - if you do end up trying to fix it, could you please investigate your fix in the context of the in-tree armv4t
example?
Assuming this isn't another funky architecture-specific issue, fixing it for the in-tree example should also fix things in your implementation, while also making it a lot easier for me to validate the solution on my end.
Welp, this bug has nerd sniped me, so I spent my (late) lunch looking into this...
I went ahead and tested out what gdbserver
does in this scenario (i.e: debugging a regular 'ol userland binary on my x64 linux box), and here's what I found:
Running stepi 10000
and then trying to Ctrl-C in between appears to work "correctly", insofar as the target seems to stop once the trigger is hit. I decided to dig into the gdbserver source code, and it uses some very straightforward logic whereby "if 0x3 is enountered outside of the regular packet framing, attempt to interrupt the target".
What does "interrupt the target" entail?
The gdbserver triggers a SIGINT (signal number 02) within the running process, which is then returned in the next stop-reply packet! i.e: instead of returning a typical SIGTRAP (S05
) stop reply packet, the target will return a SIGINT (S02
) stop reply packet instead, which the GDB client acknowledges, and breaks out of its stepi
loop!
In fact, if you run the GDB command handle SIGINT nostop
before proceeding to run stepi 1000
, you can actually reproduce the "Ctrl-C doesn't work" behavior perfectly, as GDB will simply not stop when the target sends back a S02
response packet.
It seems a possible solution here will be to make the Pump
state pump
method return Event::CtrlCInterrupt
as well. The added complexity here is that because we aren't in the DeferredStopReason
part of the protocol state machine, we cannot immediately respond with S02
, as that would result in a protocol error. Instead, we are forced to require that the target implementation "stashes" this interrupt request until the next time a vCont packet arrives.
EDIT: I have a (brittle) local fix that seems to prove that returning S02
does indeed break the stepi 10000
chain in the armv4t example. All that's left is to think of how this should be expressed via the state machine API, and whether or not it might be possible to hide this wart from end users all together...
Nice! It seems like some clarification in the GDB documentation would be in order for this case... at least I couldn't find it there, the gdbserver implementation seems to be the documentation in this case :) FWIW deferring the Ctrl+C packet until we're in deferred stop state again seems like something that would be quite simple to support with the explicit state machine model. But it does make me wonder if this is the only packet that is a special case :)
Sorry for the delay, but the fix is now in dev/0.6
!
I've gone ahead and substantially reworked the state machine API to be a bit more streamlined and consistent. Notably, handling interrupts and disconnect conditions is now done via dedicated state machine states, rather than Event
s that get returned by certain methods.
While I don't have great documentation just yet, I would once again recommend that you take a look at the implementation of GdbStub::run_blocking
for an example of how to interact with the state machine API.
I will close this issue after you get a chance to try out these changes + confirm that they are working as expected.
Cheers!
sounds good I will give it a try this weekend!
Just a heads up - I spent a bunch of time working on gdbstub
today, and aside from this fix, I've also pushed up some other breaking changes to dev/0.6
as well (such as introducing a GDB RSP specific Signal
type that's used instead of u8
, making IDET method naming consistent by prefixing everything with support_
, etc...).
Apologies for the API churn!
Ok --I have a test now that does Ctrl+C & continue and happy to report that it's working as expected on dev/0.6 :+1:
Got excited too early. I still have some issue with next (aka doing single-stepping) and sending Ctrl+C during that time. Will need to investigate further.
Ok nvm, I figured that was my bug -- when single-stepping I accidentially reported StepDone instead of SIGINT to interrupt_handled
... Now it works with next/single-step and Ctrl+c too :+1:
Fantastic!
It sure was a long journey to get this fixed, but I'm glad we got it working in the end!
All that's left is the arduous task of documenting all of these APIs, updating the changelog, writing a 0.5 migration guide, and publishing to crates.io... At this rate, it's looking more and more likely that all that is going to have to happen sometime in December over the holiday season 😅
Cheers!
I tried to support Ctrl+C (in GDB) by waiting for the serial line interrupt, and if it arrives providing a
ThreadStopReason::GdbCtrlCInterrupt
to the state machine. However, this gives me:Which seems to be going here. The comment is strange because I'm doing a Ctrl+C while my code is spinning in an infinite loop. Anyways I figured I'd ask also because PacketUnexpected is treated as "always a bug"...