Open sven-hoek opened 1 month ago
Hey @sven-hoek
Thanks for reporting this.
When range stepping is enabled, Bloom attempts to analyze all instructions within the given range, and intercept those that may take the target outside of the range. The error message suggests that Bloom was unable to decode a particular opcode, and so it was forced to intercept that instruction, as we don't know what it will do. However, that error is not considered to be fatal and should not result in Bloom crashing or shutting down abruptly.
A few things I need from you, please:
Could you enable debug logging, reproduce that error, and then send me the full debug log? To enable debug logging, set the debugLogging
param to true
in the root node of your bloom.yaml:
debugLogging: true
environments:
...
You may have to put the debug log into a text file, and attach it to your comment, as GitHub has a char limit for comments.
0x000001BE
)? You can obtain this via GDB, using x/10b 0x000001BA
- that should output 10 bytes of program memory, starting at 0x000001BA
.x/10bfi 0x000001BA
in GDB? It will attempt to decode the opcodes around that address and output them.As for the fatal error in GDB, I'm not sure if that's even related to the range step, as Bloom simply intercepts any instructions that it could not decode, so it shouldn't affect GDB at all. Have you tried disabling range stepping? Does GDB still crash? You can disable range stepping by setting rangeStepping
to false
in your server config, in bloom.yaml:
server:
rangeStepping: false
If the issue in GDB is related to range stepping, you can just leave range stepping disabled, for the time being. It will result in degraded stepping performance but at least it won't crash.
Sorry, the GDB commands I provided in the previous comment, for dumping program memory, were incorrect as the address 0x000001BB
is an invalid program memory address (it needs to be word-aligned, as opcodes take the form of 16-bit words). You'll want to use 0x000001BA
instead. So x/10b 0x000001BA
to dump program memory, and x/10bfi 0x000001BA
to dump decoded instructions.
I have also revised the previous comment.
Hey @navnavnav , thanks for the quick reply and the clear instructions. Also thanks for creating this great tool including good documentation.
What version of GDB are you currently using?
> avr-gdb --version
GNU gdb (GDB) 10.1.90.20210103-git
Could you enable debug logging, reproduce that error, and then send me the full debug log?
I didn't get to reproduce Bloom crashing to capture that log but here it is with just GDB crashing. I'll upload another log if I get to the point that Bloom also crashes again. https://gist.github.com/sven-hoek/4dee86bf9faccce8e4e4981c93a9c6c4
Can you provide a dump of program memory, around the address at which Bloom failed to decode the opcode (0x000001BE)?
x/10b 0x000001BA
0x1ba <CH9120Initialisation>: -49 -109 14 -108 125 0 -120 -31
0x1c2 <CH9120Initialisation+8>: -118 -107
{"token":18,"outOfBandRecord":[],"resultRecords":{"resultClass":"done","results":[]}}
In addition to a program memory dump, it will help to know if GDB has similar issues decoding that opcode. Could you try running x/10bfi 0x000001BA in GDB? It will attempt to decode the opcodes around that address and output them.
x/10bfi 0x000001BA
0x1ba <CH9120Initialisation>: push r28
0x1bc <CH9120Initialisation+2>: call 0xfa ; 0xfa <UART1ActiveState>
0x1c0 <CH9120Initialisation+6>: ldi r24, 0x18 ; 24
0x1c2 <CH9120Initialisation+8>: dec r24
0x1c4 <CH9120Initialisation+10>: brne .-4 ; 0x1c2 <CH9120Initialisation+8>
0x1c6 <CH9120Initialisation+12>: rjmp .+0 ; 0x1c8 <CH9120Initialisation+14>
=> 0x1c8 <CH9120Initialisation+14>: break
0x1ca <CH9120Initialisation+16>: .word 0x0079 ; ????
0x1cc <CH9120Initialisation+18>: ldi r24, 0x18 ; 24
0x1ce <CH9120Initialisation+20>: dec r24
I haven't got to try it yet but turning off range-stepping is a good point. Though I stepped through the same code with AVARICE instead of Bloom and GDB also crashed, so there's that. It seems to be rather the GDB side or something about the code I'm debugging (the latter of which hopefully shouldn't be an issue though).
I will create a very simple project with a simple library for better experimenting. It may take a little until I get to do that but I'll update you with my findings.
Thanks for this @sven-hoek
So I can see that you have a CALL
instruction at byte address 0x1bc
, which is made up of two words (spanning byte address 0x1bc
-> 0x1bf
). But Bloom was attempting to decode the instruction at 0x1be
- which is an invalid address as it points to the second word of that CALL
instruction.
I was worried that Bloom may be incorrectly decoding the first word (0x1bc
-> 0x1bd
) as some other, single-word instruction, which would explain why it was attempting to decode the second word separately. But I've just attempted to replicate this, and it seems to be working fine for me. I used the exact same opcode as the one in your program: 0x0E947D00
, which translates to CALL 0xFA
, and then performed a range step to step over that instruction. Bloom correctly decoded the instruction and intercepted the destination address (0xFA
), as it was outside of the requested range:
2024-07-22 01:58:50.259 BST [DS]: [DEBUG] Read GDB packet: $vCont;rfc,100:-1;c#73
2024-07-22 01:58:50.260 BST [DS]: [INFO] Handling VContRangeStep packet
2024-07-22 01:58:50.260 BST [DS]: [DEBUG] Requested stepping range start address: 0x000000fc
2024-07-22 01:58:50.260 BST [DS]: [DEBUG] Requested stepping range end address (exclusive): 0x00000100
2024-07-22 01:58:50.260 BST [DS]: [DEBUG] Issuing ReadTargetMemory command (ID: 430) to TargetController
2024-07-22 01:58:50.260 BST [DS]: [DEBUG] Delivering response for ReadTargetMemory command (ID: 430)
2024-07-22 01:58:50.260 BST [DS]: [DEBUG] Inspecting 1 instructions within stepping range (byte addresses) 0x000000fc -> 0x00000100, in preparation for new range stepping session
2024-07-22 01:58:50.260 BST [DS]: [DEBUG] Intercepting destination byte address 0x000000fa of CCPF instruction ("CALL") at byte address 0x000000fc
So, this leads me to believe that, when you attempted to step over that CALL
instruction, GDB sent an invalid address range to Bloom, with a start address of 0x1be
, which does not point to the beginning of any valid instruction. What's even worse: Once Bloom failed to decode the instruction at that address, it would have attempted to intercept it by placing a breakpoint there. That newly inserted breakpoint may have corrupted the CALL
instruction (as it was placed in the middle of it), resulting in a corrupted program.
So I think the issue here is with GDB. But before you report this to the GDB devs, can you try reproducing the error with a newer version? Version 10 is a little old. I'm on 12.2, which works great for AVR, IMO. But I understand this may be a headache, as you may have to build it from source (unless you're willing to upgrade to a newer Ubuntu version - the newer repositories seem to host newer versions of the gdb-avr package).
It seems to be rather the GDB side or something about the code I'm debugging (the latter of which hopefully shouldn't be an issue though).
Yeah I agree. That fatal error in GDB doesn't seem to be caused by the opcode decoding error in Bloom. But whatever is causing the fatal error in GDB may also be the cause for the invalid address range that GDB is sending to Bloom. Worth keeping in mind 👍🏽
Thanks for that detailed explanation and great support.
can you try reproducing the error with a newer version? Version 10 is a little old
True, I haven't thought of that. I tried gdb 13.2 and stepping through the problematic code still seems to produce errors in GDB.
I created a very simple app that also uses a static library to see if it's any library that will cause issues. Kept everything very simple and I can step into the library code without any issues with the same toolchain as before (gdb 10). I assume the other project has some weird configuration that messes up the addressing. I will try to cleanly recreate the project and will see if the error persists.
Not sure if stepping into the static library is the issue but it seems to happen whenever I try to step into or break in a statically-linked library function.
-Og
.When debugging within VSCode, the Debugging stops but when I run
avr-gdb
in the terminal and connect to Bloom's gdb-server, I can continue, though never able to step into a library function. Bloom itself doesn't always crash and so far I couldn't really pin down when it does. It happens when I am a few lines above a library function call and then try to step over (but not even the library function call itself)...I'll also report the bug on the avr-gdb side.If there's anything else I could try or any info I could provide, let me know.