SpinalHDL / openocd_riscv

Spen's Official OpenOCD Mirror
Other
48 stars 34 forks source link

Error: Can't communicate with the CPU #25

Open dnltz opened 2 years ago

dnltz commented 2 years ago

Hi @Dolu1990,

I'm currently in the process of bringing up the JTAG interface on the VexRiscv silicon. After I switched the TDO and TDI signals I got the following error:

Open On-Chip Debugger 0.11.0+dev-02577-g3eee6eb04-dirty (2022-05-27-14:39)
Licensed under GNU GPL v2
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
configs/VexRiscv.yaml
Info : auto-selecting first available session transport "jtag". To override use 'transport select <transport>'.
Info : set servers polling period to 50ms
- Initialize
Info : J-Link V9 compiled May 17 2019 09:50:41
Info : Hardware version: 9.30
Info : VTarget = 3.362 V
Info : clock speed 1 kHz
Info : JTAG tap: fpga_spinal.bridge tap/device found: 0xc0000fff (mfg: 0x7ff (<invalid>), part: 0x0000, ver: 0xc)
Warn : JTAG tap: fpga_spinal.bridge       UNEXPECTED: 0xc0000fff (mfg: 0x7ff (<invalid>), part: 0x0000, ver: 0xc)
Error: JTAG tap: fpga_spinal.bridge  expected 1 of 1: 0x10001fff (mfg: 0x7ff (<invalid>), part: 0x0001, ver: 0x1)
Error: Trying to use configured scan chain anyway...
Error: fpga_spinal.bridge: IR capture error; saw 0x08 not 0x01
Warn : Bypassing JTAG setup events due to errors
Error: !!!
Error: Can't communicate with the CPU
Error: !!!
Warn : target fpga_spinal.cpu0 examination failed
Info : starting gdb server for fpga_spinal.cpu0 on 3333
Info : Listening on port 3333 for gdb connections
Info : Listening on port 6666 for tcl connections
Info : Listening on port 4444 for telnet connections

After googling a little bit, I found some similar problems but I'm sure my j-link adapter works because I used it a lot with the FPGAs and the same design.

So, I think the TAP and IR capture-related warnings/errors are not good but can be ignored. The nasty stuff is the missing communication with the CPU, right? Any idea how to debug further?

The openocd config:

set _ENDIAN little
set _TAP_TYPE 1234

if { [info exists CPUTAPID] } {
   set _CPUTAPID $CPUTAPID
} else {
  # set useful default
   set _CPUTAPID 0x10001fff
}

adapter speed 1
adapter srst delay 260
jtag_ntrst_delay 250

set _CHIPNAME fpga_spinal
jtag newtap $_CHIPNAME bridge -expected-id $_CPUTAPID -irlen 4 -ircapture 0x1 -irmask 0xF

target create $_CHIPNAME.cpu0 vexriscv -endian $_ENDIAN -chain-position $_CHIPNAME.bridge -coreid 0 -dbgbase 0xF00F0000
vexriscv readWaitCycles 12
vexriscv cpuConfigFile $HYDROGEN_CPU0_YAML

poll_period 50

echo "- Initialize"
init
#echo "- Halting processor"
#reset_config trst_only
#jtag arp_init
Dolu1990 commented 2 years ago

Hi,

Info : JTAG tap: fpga_spinal.bridge tap/device found: 0xc0000fff (mfg: 0x7ff (), part: 0x0000, ver: 0xc) Warn : JTAG tap: fpga_spinal.bridge UNEXPECTED: 0xc0000fff (mfg: 0x7ff (), part: 0x0000, ver: 0xc) Error: JTAG tap: fpga_spinal.bridge expected 1 of 1: 0x10001fff (mfg: 0x7ff (), part: 0x0001, ver: 0x1) Error: Trying to use configured scan chain anyway... Error: fpga_spinal.bridge: IR capture error; saw 0x08 not 0x01 Warn : Bypassing JTAG setup events due to errors

This indicate that the JTAG doesn't communicate properly with the device JTAG tap. So, this has to be fixed first. especialy the 0xc0000fff vs 0x10001fff.

if you can take a mesurement of the JTAG signal during that initial phase, i can take a look. So one thing, is that at this level, there is one falling edge flip flop in the design, could it be an issue with the asic flow ? There is no CPU / block ram involved at this stage.

dnltz commented 2 years ago

Hi,

Info : JTAG tap: fpga_spinal.bridge tap/device found: 0xc0000fff (mfg: 0x7ff (), part: 0x0000, ver: 0xc) Warn : JTAG tap: fpga_spinal.bridge UNEXPECTED: 0xc0000fff (mfg: 0x7ff (), part: 0x0000, ver: 0xc) Error: JTAG tap: fpga_spinal.bridge expected 1 of 1: 0x10001fff (mfg: 0x7ff (), part: 0x0001, ver: 0x1) Error: Trying to use configured scan chain anyway... Error: fpga_spinal.bridge: IR capture error; saw 0x08 not 0x01 Warn : Bypassing JTAG setup events due to errors

This indicate that the JTAG doesn't communicate properly with the device JTAG tap. So, this has to be fixed first. especialy the 0xc0000fff vs 0x10001fff.

if you can take a mesurement of the JTAG signal during that initial phase, i can take a look. So one thing, is that at this level, there is one falling edge flip flop in the design, could it be an issue with the asic flow ? There is no CPU / block ram involved at this stage.

Hi @Dolu1990, sure I can do that. Is a screenshot of an oscilloscope enough? (Hopefully I have some time on this weekend for this)

Dolu1990 commented 2 years ago

Hi

Is a screenshot of an oscilloscope enough?

Not realy, screenshot are too narrow.

Idealy, one oscilloscope screenshot to check signal integrity, and one long logic analyser trace to check the behaviour.

dnltz commented 2 years ago

Hey, I checked the signal and the TMS and TCK looked really bad, which might come from my poor soldering skills.

Well, now I connect to the TAP and the interface looks good but seems like the target is not halted. This log.txt was created while connecting with GDB to it.

Do you see anything familiar which might help before I debug further?

Dolu1990 commented 2 years ago

Ahhh , can you show me your openocd tcl scripts ?

dnltz commented 2 years ago

I execute openocd with

#!/bin/bash

openocd_riscv/src/openocd -c "set HYDROGEN_CPU0_YAML configs/VexRiscv.yaml" -f openocd_riscv/tcl/interface/jlink.cfg -f configs/carbon.cfg -d

and carbon.cfg looks like

set _ENDIAN little
set _TAP_TYPE 1234

if { [info exists CPUTAPID] } {
   set _CPUTAPID $CPUTAPID
} else {
  # set useful default
   set _CPUTAPID 0x10001fff
}

adapter speed 100
adapter srst delay 260
jtag_ntrst_delay 250

set _CHIPNAME fpga_spinal
jtag newtap $_CHIPNAME bridge -expected-id $_CPUTAPID -irlen 4 -ircapture 0x1 -irmask 0xF

target create $_CHIPNAME.cpu0 vexriscv -endian $_ENDIAN -chain-position $_CHIPNAME.bridge -coreid 0 -dbgbase 0xF00F0000
vexriscv readWaitCycles 12
vexriscv cpuConfigFile $HYDROGEN_CPU0_YAML

poll_period 50

echo "- Initialize"
init
echo "- Halting processor"
reset_config trst_only
jtag arp_init

openocd binary is your latest version from GH.

Dolu1990 commented 2 years ago

reset_config trst_only ?

i would say, go more for the following after the init : https://github.com/SpinalHDL/openocd_riscv/blob/riscv_spinal/tcl/target/murax.cfg#L26

dnltz commented 2 years ago

YES! I can dump the registers :)

Not sure why I changed the last two lines... however, I can now flash a simple binary but I'm not able to halt the CPU with monitor reset halt

...
Debug: 2353 10502 command.c:201 script_debug(): command - expr  [ string first "jtag" $_TRANSPORT ] != -1 
Debug: 2354 10502 command.c:201 script_debug(): command - fpga_spinal.cpu0 cget -chain-position
Debug: 2355 10502 command.c:201 script_debug(): command - jtag tapisenabled fpga_spinal.bridge
Debug: 2356 10502 command.c:201 script_debug(): command - fpga_spinal.cpu0 was_examined
Debug: 2357 10502 command.c:201 script_debug(): command - fpga_spinal.cpu0 arp_waitstate halted 1000
Debug: 2367 10519 target.c:3273 target_wait_state(): waiting for target halted...
Debug: 2648 11027 gdb_server.c:396 gdb_log_outgoing_packet(): sending packet: $O#4f'
Debug: 2929 11535 gdb_server.c:396 gdb_log_outgoing_packet(): sending packet: $O#4f'
Error: 2930 11535 target.c:3281 target_wait_state(): timed out while waiting for target halted
Debug: 2931 11535 gdb_server.c:396 gdb_log_outgoing_packet(): sending packet: $O74696d6564206f7574207768696c652077616974696e6720666f72207461726765742068616c7465640a#fa'
Debug: 2932 11536 command.c:201 script_debug(): command - fpga_spinal.cpu0 curstate
Debug: 2933 11536 command.c:590 run_command(): Command 'reset' failed with error code -4
User : 2934 11536 command.c:654 command_run_line(): TARGET: fpga_spinal.cpu0 - Not halted
Debug: 2935 11536 gdb_server.c:396 gdb_log_outgoing_packet(): sending packet: $O5441524745543a20667067615f7370696e616c2e63707530202d204e6f742068616c7465640a#a5'
Debug: 2945 11552 gdb_server.c:396 gdb_log_outgoing_packet(): sending packet: $OK#9a'

and when I run the step command in gdb (after load without halt) I get

Debug: 2295 6980 gdb_server.c:396 gdb_log_outgoing_packet(): sending packet: $OK#9a'
Debug: 2296 6980 gdb_server.c:1643 gdb_write_memory_binary_packet(): addr: 0xa0000000, len: 0x00000020
Debug: 2297 6980 target.c:2401 target_write_buffer(): writing buffer of 32 byte at 0xa0000000
Debug: 2298 6980 vexriscv.c:1300 vexriscv_write_memory(): Writing memory at physical address 0xa0000000; size 4; count 8
Debug: 2299 6980 vexriscv.c:779 vexriscv_halt(): target->state: running
Debug: 2309 6990 vexriscv.c:1361 vexriscv_write_memory(): SAVED : 5

Debug: 2319 7180 gdb_server.c:384 gdb_log_incoming_packet(): received packet: P20=200000a0
Debug: 2320 7180 vexriscv.c:1497 vexriscv_get_gdb_reg_list(): vexriscv_get_gdb_reg_list 0

Debug: 2321 7180 gdb_server.c:396 gdb_log_outgoing_packet(): sending packet: $OK#9a'
Debug: 5733 32328 gdb_server.c:384 gdb_log_incoming_packet(): received packet: m0,2
Debug: 5734 32328 gdb_server.c:1495 gdb_read_memory_packet(): addr: 0x0000000000000000, len: 0x00000002
Debug: 5735 32328 target.c:2466 target_read_buffer(): reading buffer of 2 byte at 0x00000000
openocd: src/target/vexriscv.c:1251: vexriscv_read_memory: Assertion `target->state == TARGET_HALTED' failed.
Debug: 5736 32328 server.c:613 sig_handler(): Terminating on Signal 6
./debug.sh: line 3:  6303 Aborted                 (core dumped) openocd_riscv/src/openocd -c "set HYDROGEN_CPU0_YAML configs/VexRiscv.yaml" -f openocd_riscv/tcl/interface/jlink.cfg -f configs/carbon.cfg -d

Which looks like it wants to read from 0x0 instead of 0xa :/

dnltz commented 2 years ago

Seems like JTAG has not really a connection to the memory:

(gdb) x 0xa0000000                 
0xa0000000 <_head>: 0x02faf080

I receive this value for all address I try.

Dolu1990 commented 2 years ago

i would say, for now do not use GDB, but instead use the telnet from openocd
telnet localhost 4444 That would provide a more direct access to the hardware for debuging

You can then read / write values with mdw address / mww address data

Could you give a try ?

dnltz commented 2 years ago

ah okay. Just tried it without success again.

mwd 0xa000000 returns a random value but for all address the same. Moreover, register pc has the same value while some registers can't be read.

> reg x0
Could not read register 'x0'

> reg sp
register sp not found in current target

> reg pc
pc (/32): 0x93faffc4

> reg instreth
instreth (/32): 0x00000000

> reg mvendorid
mvendorid (/32): 0x00000000

> mdw 0xa0000000
0xa0000000: 93faffc4 

> mdw 0xa0000004
0xa0000004: 93faffc4 

-------- read/write gpio register
> mdw 0xF0010000  
0xf0010000: 93faffc4 

> mww 0xF0010000 0x00000015
> mdw 0xF0010000           
0xf0010000: 93faffc4 

I still sometimes see the "Can't communicate with CPU" error from OpenOCD but restarting it for 1-5 times works.

Dolu1990 commented 2 years ago

Ahhh Can't communicate with CPU is a sanity check, if it popup, there is no chance for telnet / gdb to work. Hmmm realy have to fix it.

Basicaly what "Can't communicate with CPU" check is : https://github.com/SpinalHDL/openocd_riscv/blob/riscv_spinal/src/target/vexriscv.c#L1780

So, one thing which can be done, is to avoid testing the register file, but just testing the "lui x0, 0xABCDE"

this one will not require x0 to work, it use no memories at all (unlike addi)

You could give a try, commenting the 3 other tests out ?

Dolu1990 commented 2 years ago

So, this is only to diagnostic purposes, not as a fix.

dnltz commented 2 years ago

So, all four values in the buffer have again the random, same value. I was able to track down sometimes the device is examined and sometimes not. Added the instructions directly at the beginning of the function and the values are the same as well. I need to read the HDL implementation to understand the interface more :)

dnltz commented 2 years ago

I have another question: The test instructions use x0. Is register zero writable?

Dolu1990 commented 2 years ago

Is register zero writable?

It isn't but basicaly, the debugger plugin will capture the last pipeline data value, even if it target x0, allowing to not have side effects on any real register of the register file