golioth / golioth-firmware-sdk

Firmware SDK enabling any IoT device to connect to Golioth - the Universal Connector for IoT
https://golioth.io
Apache License 2.0
57 stars 10 forks source link

JTAG difficulties #77

Open gojimmypi opened 1 year ago

gojimmypi commented 1 year ago

Golioth Firmware SDK Commit Hash

ec34837aad01b55ba6f961291bbd96d841956eee

Platform/OS

ESP IDF Release v5

WSL and Windows with VisualGDB;

Visual Studio 2019 with VisualGDB Custom Edition version 5.6R9 (build 4777).

Microsoft Visual Studio Enterprise 2019
Version 16.11.21
VisualStudio.16.Release/16.11.21+33027.164
Microsoft .NET Framework
Version 4.8.04084

Installed Version: Enterprise

Hardware Target

ESP32-WROOM32-32U on the ESP32_DevKitC_V4

What host OS are you using?

Windows 10

Current Behavior

I'm taking the golioth_basics demo for a test drive. The app itself works great as described in my tweet thread.

The challenge I've encountered is that once that firmware is loaded, future instances of JTAG debugging seem to be impossible. The only way I have found to resolve this problem is to flash a simple hello world on to the ESP32 and then load the Golioth firmware. As noted the JTAG debugger works the first time, but subsequent sessions refuse to connect. I'll see this message until it times out:

image image

Here's a picture of my setup with the Tigard:

image

I have a blog that describes the setup. In particular, I am using these pins:

image

I've seen this "impossible to connect JTAG" before: typically it is when those same GPIO pins are initialized and/or used for something other than JTAG. I'll be taking a look, but perhaps one of the active Golioth developers familiar with the code already knows the answer?

I'm fairly certain that the root cause is in the Golioth codebase, as I use this exact setup on a very regular regular basis and never see this particular problem with other source code.

If indeed this is a known "preinitialize the GPIO pins", perhaps there's an option to not do that during debugging?

Expected Behavior

JTAG debugging should connect and single step the code every time, regardless of what firmware was previously installed.

Steps To Reproduce

Simply press the "step into" button in Visual Studio:

image

Normally the JTAG programmer will flash the firmware and step into the code

Logs, console output, or any kind of debug information

ets Jun  8 2016 00:22:57

rst:0x1 (POWERON_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
flash read err, 1000
ets_main.c 371
ets Jun  8 2016 00:22:57

rst:0x10 (RTCWDT_RTC_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
configsip: 0, SPIWP:0xee
clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
mode:DIO, clock div:2
load:0x3fff0030,len:7000
load:0x40078000,len:15532
ho 0 tail 12 room 4
load:0x40080400,len:3840
entry 0x4008064c
I (29) boot: ESP-IDF v5.0-dirty 2nd stage bootloader
I (29) boot: compile time 11:12:18
I (29) boot: chip revision: v1.0
I (32) boot_comm: chip revision: 1, min. bootloader chip revision: 0
I (39) boot.esp32: SPI Speed      : 40MHz
I (44) boot.esp32: SPI Mode       : DIO
I (48) boot.esp32: SPI Flash Size : 4MB
I (53) boot: Enabling RNG early entropy source...
I (58) boot: Partition Table:
I (62) boot: ## Label            Usage          Type ST Offset   Length
I (69) boot:  0 nvs              WiFi data        01 02 00009000 00004000
I (77) boot:  1 otadata          OTA data         01 00 0000d000 00002000
I (84) boot:  2 phy_init         RF data          01 01 0000f000 00001000
I (91) boot:  3 ota_0            OTA app          00 10 00010000 00190000
I (99) boot:  4 ota_1            OTA app          00 11 001a0000 00190000
I (106) boot: End of partition table
I (111) boot_comm: chip revision: 1, min. application chip revision: 0
I (118) esp_image: segment 0: paddr=00010020 vaddr=3f400020 size=30c30h (199728) map
I (199) esp_image: segment 1: paddr=00040c58 vaddr=3ffbdb60 size=05a48h ( 23112) load
I (208) esp_image: segment 2: paddr=000466a8 vaddr=40080000 size=09970h ( 39280) load
I (224) esp_image: segment 3: paddr=00050020 vaddr=400d0020 size=d2968h (862568) map
I (536) esp_image: segment 4: paddr=00122990 vaddr=40089970 size=15f3ch ( 89916) load
I (573) esp_image: segment 5: paddr=001388d4 vaddr=50000000 size=00010h (    16) load
I (589) boot: Loaded app from partition at offset 0x10000
I (590) boot: Disabling RNG early entropy source...
I (601) cpu_start: Pro cpu up.
I (601) cpu_start: Starting app cpu, entry point is 0x400815dc
I (0) cpu_start: App cpu up.
I (618) cpu_start: Pro cpu start user code
I (618) cpu_start: cpu freq: 160000000 Hz
I (618) cpu_start: Application information:
I (622) cpu_start: Project name:     golioth_basics
I (628) cpu_start: App version:      v0.5.0-10-gec34837-dirty
I (634) cpu_start: Compile time:     Jan 28 2023 11:11:57
I (640) cpu_start: ELF file SHA256:  a4df1c2d2bdeea87...
I (646) cpu_start: ESP-IDF:          v5.0-dirty
I (652) heap_init: Initializing. RAM available for dynamic allocation:
I (659) heap_init: At 3FFAFF10 len 000000F0 (0 KiB): DRAM
I (665) heap_init: At 3FFB6388 len 00001C78 (7 KiB): DRAM
I (671) heap_init: At 3FFB9A20 len 00004108 (16 KiB): DRAM
I (677) heap_init: At 3FFCDAC0 len 00012540 (73 KiB): DRAM
I (683) heap_init: At 3FFE0440 len 00003AE0 (14 KiB): D/IRAM
I (690) heap_init: At 3FFE4350 len 0001BCB0 (111 KiB): D/IRAM
I (696) heap_init: At 4009F8AC len 00000754 (1 KiB): IRAM
I (704) spi_flash: detected chip: generic
I (707) spi_flash: flash io: dio
I (712) cpu_start: Starting scheduler on PRO CPU.
I (0) cpu_start: Starting scheduler on APP CPU.
I (779) BTDM_INIT: BT controller compile version [8020d24]
I (779) system_api: Base MAC address is not set
I (779) system_api: read default base MAC address from EFUSE
I (789) phy_init: phy_version 4670,719f9f6,Feb 18 2021,17:07:07
I (1119) golioth_ble: BLE Host Task Started
I (1169) NimBLE: GAP procedure initiated: advertise;
I (1169) NimBLE: disc_mode=2
I (1169) NimBLE:  adv_channel_map=0 own_addr_type=0 adv_filter_policy=0 adv_itvl_min=0 adv_itvl_max=0

Type 'help' to get the list of commands.
Use UP/DOWN arrows to navigate through command history.
Press TAB when typing command name to auto-complete.
I (1179) NimBLE:

esp32> I (1219) wifi_init: rx ba win: 6
I (1219) wifi_init: tcpip mbox: 32
I (1229) wifi_init: udp mbox: 6
I (1229) wifi_init: tcp mbox: 6
I (1229) wifi_init: tcp tx win: 5744
I (1239) wifi_init: tcp rx win: 5744
I (1239) wifi_init: tcp mss: 1440
I (1239) wifi_init: WiFi IRAM OP enabled
I (1249) wifi_init: WiFi RX IRAM OP enabled
W (1299) wifi:<ba-add>idx:0 (ifx:0, c8:d7:19:78:3e:b9), tid:0, ssn:0, winSize:64
I (3689) esp_netif_handlers: sta ip: 192.168.25.128, mask: 255.255.255.0, gw: 192.168.25.1
I (3689) example_wifi: WiFi Connected. Got IP:192.168.25.128
I (3699) example_wifi: Connected to AP SSID: OMIT
I (3699) golioth_mbox: Mbox created, bufsize: 2184, num_items: 20, item_size: 104
I (3709) golioth_basics: Waiting for connection to Golioth...
I (3769) golioth_coap_client: Start CoAP session with host: coaps://coap.golioth.io
I (3769) golioth_coap_client: Session PSK-ID: gojimmypi-demo@salmon-ruling-silkworm
I (3789) libcoap: Setting PSK key

I (3799) golioth_coap_client: Entering CoAP I/O loop
I (4289) golioth_coap_client: Golioth CoAP client connected
I (4289) golioth_basics: Golioth client connected
I (4299) golioth_basics: Hello, Golioth!
I (4299) golioth_fw_update: Current firmware version: main - 1.2.5
I (5599) golioth_fw_update: Waiting to receive OTA manifest
I (5699) golioth_basics: Synchronously got my_int = 42
I (5709) golioth_basics: Entering endless loop
I (5709) golioth_basics: Sending hello! 0
I (5809) golioth_fw_update: Received OTA manifest
I (5809) golioth_fw_update: Manifest does not contain different firmware version. Nothing to do.
I (5819) golioth_fw_update: Waiting to receive OTA manifest
I (6099) golioth_basics: Callback got my_int = 42
W (7369) golioth_coap_client: 4.00 (req type: 3, path: .c/status), len 59
esp32>
esp32>
esp32> I (15709) golioth_basics: Sending hello! 1
esp32>
esp32>
esp32>

edit: here's the GDB log when it fails to connect. During this time, the Golioth software on the ESP32 appears to be running properly and uninterrupted:

C:\Users\gojimmypi\AppData\Local\VisualGDB\EmbeddedDebugPackages\com.sysprogs.esp32.core\bin\openocd.exe -c "gdb_port 50299" -c "telnet_port 50297" -f interface/tigard.cfg -c "adapter_khz 13000" -f target/esp32.cfg -c "echo VisualGDB_OpenOCD_Ready"
Open On-Chip Debugger 0.10.0 (2022-05-03)
Licensed under GNU GPL v2
libusb1 09e75e98b4d9ea7909e8837b7a3f00dda4589dc3
For bug reports, read
    http://openocd.org/doc/doxygen/bugs.html
jtag
DEPRECATED! use 'adapter speed' not 'adapter_khz'
adapter speed: 13000 kHz

Warn : Transport "jtag" was already selected
VisualGDB_OpenOCD_Ready
Info : Listening on port 6666 for tcl connections
Info : Listening on port 50297 for telnet connections
Error: libusb_open() failed with LIBUSB_ERROR_NOT_SUPPORTED
Info : ftdi: if you experience problems at higher adapter clocks, try the command "ftdi tdo_sample_edge falling"
Info : clock speed 13000 kHz
Info : JTAG tap: esp32.cpu0 tap/device found: 0x120034e5 (mfg: 0x272 (Tensilica), part: 0x2003, ver: 0x1)
Info : JTAG tap: esp32.cpu1 tap/device found: 0x120034e5 (mfg: 0x272 (Tensilica), part: 0x2003, ver: 0x1)
Info : esp32.cpu0: Target halted, PC=0x40090CA6, debug_reason=00000000
Info : esp32.cpu1: Debug controller was reset.
Info : esp32.cpu1: Core was reset.
Info : esp32.cpu1: Target halted, PC=0x40090CA6, debug_reason=00000000
Info : starting gdb server for esp32.cpu0 on 50299
Info : Listening on port 50299 for gdb connections
Info : accepting 'gdb' connection on tcp/50299
Warn : No symbols for FreeRTOS!
Info : esp32.cpu0: Debug controller was reset.
Info : esp32.cpu0: Core was reset.
Error: timed out while waiting for target halted
Info : esp32.cpu0: Target halted, PC=0x40090CA6, debug_reason=00000000
Info : Set GDB target to 'esp32.cpu0'
Error: xtensa_wait_algorithm: not halted 0, pc 0x40090ca6, ps 0x60420
Error: Failed to wait algorithm (-302)!
Error: Algorithm run failed (-302)!
Error: Too many flash mappings 1931505486! Must be 2.
Warn : Failed to get flash mappings (-4)!
Error: Target is already running an algorithm
Error: Failed to start algorithm (-4)!
Error: Failed to get flash size!
Error: Target is already running an algorithm
Error: Failed to start algorithm (-4)!
Error: Failed to get flash size!
Error: Failed to probe flash, size 0 KB
Error: auto_probe failed
Error: Connect failed. Consider setting up a gdb-attach event for the target to prepare target for GDB connect, or use 'gdb_memory_map disable'.
Error: attempted 'gdb' connection rejected
Info : esp32.cpu1: Debug controller was reset.
Info : esp32.cpu1: Core was reset.
Info : esp32.cpu0: Debug controller was reset.
Info : esp32.cpu0: Core was reset.

Note the curious Too many flash mappings 1931505486! Must be 2. message. Cause or effect?

ncmiller commented 1 year ago

@gojimmypi - Thanks for the detailed information.

I've seen this "impossible to connect JTAG" before: typically it is when those same GPIO pins are initialized and/or used for something other than JTAG. I'll be taking a look, but perhaps one of the active Golioth developers familiar with the code already knows the answer?

I don't think there's anything in the Golioth code using those pins, but if you've had success with JTAG debugging in other projects (e.g. hello-world) then perhaps there is some kind of conflict. The only place that comes to mind is the console shell initialization code: https://github.com/golioth/golioth-firmware-sdk/blob/main/examples/esp_idf/common/shell.c#L406. It's one of the few (maybe the only?) place where IO is configured in the golioth_basics project.

As a test, can you see if your issue exists on the basic shell example in esp-idf? https://github.com/espressif/esp-idf/tree/master/examples/system/console/basic.

And a second test - disable the shell by commenting out this line, and see if the issue exists: https://github.com/golioth/golioth-firmware-sdk/blob/main/examples/esp_idf/golioth_basics/main/app_main.c#L34

gojimmypi commented 1 year ago

Hello @ncmiller and thank you for the prompt reply!

I don't think there's anything in the Golioth code using those pins

... and you are right. those code segments were not at all the root cause. Thanks for the tip & saving me the time of going and looking for (non existent) GPIO initialization.

But given your comment, I was wondering what else might be different? I started looking at your sdkconfig and compared to the known-good one that I use at wolfSSL.

Turns out, if I remove these lines from the sdkconfig.defaults, then JTAG works great:

# TODO - re-enable once resolved: https://github.com/espressif/esp-idf/issues/10322
# Until then, optimize for performance, otherwise IRAM will overflow (BLE stack uses a lot of IRAM).
#CONFIG_COMPILER_OPTIMIZATION_SIZE=y
CONFIG_COMPILER_OPTIMIZATION_PERF=y

# If the new app firmware does not cancel the rollback,
# then this will ensure that rollback happens automatically
# if/when the device is next rebooted.
CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y

# Bluetooth
CONFIG_GOLIOTH_BLE_SERVICE_ENABLED=y
CONFIG_BT_ENABLED=y
CONFIG_BTDM_CTRL_MODE_BLE_ONLY=y
CONFIG_BTDM_CTRL_MODE_BR_EDR_ONLY=n
CONFIG_BTDM_CTRL_MODE_BTDM=n
CONFIG_BT_BLUEDROID_ENABLED=n
CONFIG_BT_NIMBLE_ENABLED=y
CONFIG_BT_NIMBLE_MAX_CONNECTIONS=2
CONFIG_BTDM_CTRL_BLE_MAX_CONN=2
CONFIG_BT_NIMBLE_ROLE_PERIPHERAL=y
CONFIG_BT_NIMBLE_ROLE_CENTRAL=n
CONFIG_BT_NIMBLE_ROLE_BROADCASTER=n
CONFIG_BT_NIMBLE_ROLE_OBSERVER=n

Given the TODO comment, it looks like you folks have been having some challenges with BLE, and I saw similar issues.

The line of particular interest is this one:

CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y

I do recall seeing an odd bootloader prompt in the UART output when JTAG was not working. There's no mention of that setting being problematic in the docs, so I'm not sure if that's the only setting or perhaps the combination of disabling Bluetooth.

I wonder if there's RAM overhead for JTAG debugging?

gojimmypi commented 1 year ago

Here is further evidence that the CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y is the root cause of the JTAG programming difficulties:

image

I wonder if the JTAG reset to program the device causes this rollback operation?

ncmiller commented 1 year ago

I'm not sure if that's the only setting or perhaps the combination of disabling Bluetooth.

Have you tried disabling them in isolation, i.e. disabling Bluetooth but enabling CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE (and vice versa)?

Here is further evidence that the CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y is the root cause of the JTAG programming difficulties:

For those app rollback notes, I believe they're only relevant if there is a pending OTA update, meaning that esp_ota_set_boot_partition was called. From https://docs.espressif.com/projects/esp-idf/en/latest/esp32/api-reference/system/ota.html#rollback-process:

The new application is successfully downloaded and esp_ota_set_boot_partition() function makes this partition bootable and sets the state ESP_OTA_IMG_NEW

If the state is ESP_OTA_IMG_NEW, then I think it's necessary to call either esp_ota_mark_app_valid_cancel_rollback or esp_ota_mark_invalid_rollback_and_reboot to finalize the pending OTA update. But that's not the case for you, since there's no OTA update happening.

I'd be surprised if CONFIG_BOOTLOADER_APP_ROLLBACK_ENABLE=y were the root cause. I'd expect Bluetooth to be the more likely root cause.

ncmiller commented 1 year ago

The issue seems kind of similar to https://github.com/espressif/openocd-esp32/issues/223.

It might be worth checking whether your openocd version is up to date with https://github.com/espressif/openocd-esp32/releases/tag/v0.11.0-esp32-20220706.

Another idea would be to slow down the JTAG clock to see if it changes anything. You're running at 13 KHz, so maybe try 10KHz or 8KHz.