RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.83k stars 1.97k forks source link

cpu/esp32: WDT reset in example/lorawan #11443

Closed yegorich closed 5 years ago

yegorich commented 5 years ago

I'm porting TTGO T-Beam board to RIOT (#11418). Executing examples/lorawan produces following output:

2019-04-25 10:26:56,695 - INFO # I (37) boot: ESP-IDF v3.1-dev-961-ga255622 2nd stage bootloader
2019-04-25 10:26:56,699 - INFO # I (43) boot: compile time 12:38:02
2019-04-25 10:26:56,705 - INFO # I (56) boot: Enabling RNG early entropy source...
2019-04-25 10:26:56,710 - INFO # I (56) boot: SPI Speed      : 40MHz
2019-04-25 10:26:56,715 - INFO # I (57) boot: SPI Mode       : DOUT
2019-04-25 10:26:56,719 - INFO # I (61) boot: SPI Flash Size : 4MB
2019-04-25 10:26:56,724 - INFO # I (65) boot: Partition Table:
2019-04-25 10:26:56,731 - INFO # I (69) boot: ## Label            Usage          Type ST Offset   Length
2019-04-25 10:26:56,739 - INFO # I (76) boot:  0 nvs              WiFi data        01 02 00009000 00006000
2019-04-25 10:26:56,747 - INFO # I (83) boot:  1 phy_init         RF data          01 01 0000f000 00001000
2019-04-25 10:26:56,755 - INFO # I (91) boot:  2 factory          factory app      00 00 00010000 00020a90
2019-04-25 10:26:56,760 - INFO # I (98) boot: End of partition table
2019-04-25 10:26:56,769 - INFO # I (103) esp_image: segment 0: paddr=0x00010020 vaddr=0x3f400020 size=0x03f78 ( 16248) map
2019-04-25 10:26:56,779 - INFO # I (118) esp_image: segment 1: paddr=0x00013fa0 vaddr=0x3ffb0000 size=0x0129c (  4764) load
2019-04-25 10:26:56,788 - INFO # I (122) esp_image: segment 2: paddr=0x00015244 vaddr=0x40080000 size=0x00400 (  1024) load
2019-04-25 10:26:56,797 - INFO # I (130) esp_image: segment 3: paddr=0x0001564c vaddr=0x40080400 size=0x07928 ( 31016) load
2019-04-25 10:26:56,806 - INFO # I (152) esp_image: segment 4: paddr=0x0001cf7c vaddr=0x00000000 size=0x03094 ( 12436)
2019-04-25 10:26:56,816 - INFO # I (156) esp_image: segment 5: paddr=0x00020018 vaddr=0x400d0018 size=0x10a54 ( 68180) map
2019-04-25 10:26:56,825 - INFO # I (187) boot: Loaded app from partition at offset 0x10000
2019-04-25 10:26:56,831 - INFO # I (187) boot: Disabling RNG early entropy source...
2019-04-25 10:26:56,832 - INFO #
2019-04-25 10:26:56,836 - INFO # Starting ESP32 with ID: c23c71bf9db6cc
2019-04-25 10:26:56,837 - INFO #
2019-04-25 10:26:56,844 - INFO # Current clocks in Hz: CPU=80000000 APB=80000000 XTAL=40000000 SLOW=150000
2019-04-25 10:26:56,850 - INFO # PRO cpu is up (single core mode, only PRO cpu is used)
2019-04-25 10:26:56,853 - INFO # PRO cpu starts user code
2019-04-25 10:26:56,861 - INFO # _lock_acquire_recursive parameter condition (sched_active_thread != 0) not fulfilled
2019-04-25 10:26:56,869 - INFO # _lock_release_recursive parameter condition (sched_active_thread != 0) not fulfilled
2019-04-25 10:26:56,877 - INFO # system_wdt_init TIMERG0 wdt_config0=b83f8000 wdt_config1=00500000 wdt_config2=001e8480
2019-04-25 10:26:56,888 - INFO # _lock_acquire_recursive parameter condition Used clocks in Hz: CPU=80000000 APB=80000000 XTAL=40000000 FAST=8000000 SLOW=150000
2019-04-25 10:26:56,892 - INFO # XTAL calibration value: 3544115
2019-04-25 10:26:56,897 - INFO # Heap free: 176128 bytes
2019-04-25 10:26:56,905 - INFO # _lock_acquire_recursive parameter condition (sched_active_thread != 0) not fulfilled
2019-04-25 10:26:56,913 - INFO # _lock_release_recursive parameter condition (sched_active_thread != 0) not fulfilled
2019-04-25 10:26:56,920 - INFO # _lock_acquire parameter condition (sched_active_thread != 0) not fulfilled
2019-04-25 10:26:56,927 - INFO # _lock_release parameter condition (sched_active_thread != 0) not fulfilled
2019-04-25 10:26:56,930 - INFO # System time: 1970-01-01 00:00:06
2019-04-25 10:26:56,931 - INFO #
2019-04-25 10:26:56,934 - INFO # Board configuration:
2019-04-25 10:26:56,939 - INFO #        ADC             pins=[ 36 39 32 33 34 35 4 0 2 13 25 ]
2019-04-25 10:26:56,941 - INFO #        DAC             pins=[ 25 ]
2019-04-25 10:26:56,945 - INFO #        PWM_DEV(0)      channels=[ 25 0 2 ]
2019-04-25 10:26:56,948 - INFO #        I2C_DEV(0)      scl=22 sda=21
2019-04-25 10:26:56,952 - INFO #        SPI_DEV(0)      VSPI sck=5 miso=19 mosi=27 cs=18
2019-04-25 10:26:56,955 - INFO #        UART_DEV(0)     txd=1 rxd=3
2019-04-25 10:26:56,958 - INFO #        UART_DEV(1)     txd=15 rxd=12
2019-04-25 10:26:56,961 - INFO #        LED             pins=[ 14 ]
2019-04-25 10:26:56,963 - INFO #        BUTTONS         pins=[ 39 ]
2019-04-25 10:26:56,964 - INFO #
2019-04-25 10:26:56,968 - INFO # Starting RIOT kernel on PRO cpu
2019-04-25 10:26:56,977 - INFO # I (6563) [main_trampoline]: main(): This is RIOT! (Version: 2019.07-devel-89-g9fed1-pr/esp32/esptool)
2019-04-25 10:26:56,981 - INFO # LoRaWAN Class A low-power application
2019-04-25 10:26:56,985 - INFO # =====================================
2019-04-25 10:26:56,988 - INFO # system_wdt_feed
2019-04-25 10:26:56,990 - INFO # system_wdt_feed
2019-04-25 10:26:56,994 - INFO # [semtech-loramac] initializing loramac
2019-04-25 10:27:02,959 - INFO # ets Jun  8 2016 00:22:57
2019-04-25 10:27:02,960 - INFO #
2019-04-25 10:27:02,966 - INFO # rst:0x7 (TG0WDT_SYS_RESET),boot:0x13 (SPI_FAST_FLASH_BOOT)
2019-04-25 10:27:02,969 - INFO # configsip: 0, SPIWP:0xee
2019-04-25 10:27:02,976 - INFO # clk_drv:0x00,q_drv:0x00,d_drv:0x00,cs0_drv:0x00,hd_drv:0x00,wp_drv:0x00
2019-04-25 10:27:02,979 - INFO # mode:DOUT, clock div:2
2019-04-25 10:27:02,981 - INFO # load:0x3fff0018,len:4
2019-04-25 10:27:02,984 - INFO # load:0x3fff001c,len:5632
2019-04-25 10:27:02,987 - INFO # ho 0 tail 12 room 4
2019-04-25 10:27:02,989 - INFO # load:0x40078000,len:0
2019-04-25 10:27:02,992 - INFO # load:0x40078000,len:14868
2019-04-25 10:27:02,995 - INFO # entry 0x40078628
2019-04-25 10:27:03,000 - INFO # W (30) boot: PRO CPU has been reset by WDT.
2019-04-25 10:27:03,006 - INFO # W (31) boot: WDT reset info: PRO CPU PC=0x40081ecb
2019-04-25 10:27:03,012 - INFO # W (31) boot: WDT reset info: APP CPU PC=0x7ee2cab7

tests/pkg_semtech-loramac is working without an issue and I can join a LoRaWAN network and send/receive packets.

@gschorcht suggested to enable syscall debug messages to see when system_wdt_feed is called the last time. The console output shows that it will be called before loramac init and 6 seconds later WDT resets the system.

2019-04-25 10:26:56,988 - INFO # system_wdt_feed
2019-04-25 10:26:56,990 - INFO # system_wdt_feed
2019-04-25 10:26:56,994 - INFO # [semtech-loramac] initializing loramac
2019-04-25 10:27:02,959 - INFO # ets Jun  8 2016 00:22:57

Steps to reproduce the issue

Flash examples/lorawan on a TTGO T-Beam board.

Expected results

LoRaWAN example starts and tries to join the network and sends packets.

Actual results

The system will be reset via WDT.

Versions

Riotdocker was used to build the application.

gschorcht commented 5 years ago

@yegorich IMHO, the issue title is somehow misleading. It suggests that it is a problem of threaded applications and threaded application aren't working at all. There are a large number of threaded applications like examples/gnrc_networking which use a lot of threads and are working without any problems. Could you find a subject that is more sepecific?

yegorich commented 5 years ago

I have such a feeling that applications providing a shell don't suffer from this issue. lorawan example just ends with a return after sending the first message:

/* trigger the first send */
msg_t msg;
msg_send(&msg, sender_pid);
return 0;

Is there any official way to end the main thread in an event loop?

gschorcht commented 5 years ago

lorawan example just ends with a return after sending the first message:

This shouldn't be the reason. According to your log, this return isn't reached at all. Otherwise you would have the Starting join procedure message in your log.

examples/lorawan as well as tests/pkg_semtech_lorawan initialize the SemTech SX127X in the main thread. My impression is that any of the semtech_lorawan_* functions before

    semtech_loramac_init(&loramac);
    semtech_loramac_set_deveui(&loramac, deveui);
    semtech_loramac_set_appeui(&loramac, appeui);
    semtech_loramac_set_appkey(&loramac, appkey);
    semtech_loramac_set_dr(&loramac, LORAMAC_DR_5);

does not return. Could you identify which function does not return by inserting printf debug messages? My guess is that semtech_loramac_init doesn't return?

As I understood, tests/pkg_semtech_lorawan is working and you are able to call all these functions step by step from shell. Right?

yegorich commented 5 years ago

Yes, tests/pkg_semtech_lorawan is working and I am able to call all these functions step by step from shell.

lorawan initialization never goes beyond LoRaMacInitialization in pkg/semtech-loramac/contrib/semtech_loramac.c.

How can I add printfs to LoRaMac-node? Every time I invoke make it seems to clone this repository from scratch and all my changes will be removed:

user@ubuntuat:~/Documents/versioned/RIOT$ make BOARD=esp32-ttgo-t-beam -C examples/lorawan/ flash term
make: Entering directory '/home/user/Documents/versioned/RIOT/examples/lorawan'
Building application "lorawan" for "esp32-ttgo-t-beam" with MCU "esp32".

if [ 3bfef71b1bea135ebd7f4cb8a2c7ccd741018441 != 1cdd9ccec4c9f05b616e7112059be4a9e358c571 ] ; then \
        git -C /home/user/Documents/versioned/RIOT/examples/lorawan/bin/pkg/esp32-ttgo-t-beam/semtech-loramac clean -xdff ; \
        git -C /home/user/Documents/versioned/RIOT/examples/lorawan/bin/pkg/esp32-ttgo-t-beam/semtech-loramac fetch "https://github.com/Lora-net/LoRaMac-node.git" "1cdd9ccec4c9f05b616e7112059be4a9e358c571" ; \
        git -C /home/user/Documents/versioned/RIOT/examples/lorawan/bin/pkg/esp32-ttgo-t-beam/semtech-loramac checkout -f 1cdd9ccec4c9f05b616e7112059be4a9e358c571 ; \
        touch /home/user/Documents/versioned/RIOT/examples/lorawan/bin/pkg/esp32-ttgo-t-beam/semtech-loramac/.git-downloaded ; \
fi
Removing .git-downloaded
Removing .git-patched
Removing Makefile
Removing src/boards/mcu/Makefile
Removing src/mac/Makefile
Removing src/mac/region/Makefile
Removing src/system/crypto/Makefile
From https://github.com/Lora-net/LoRaMac-node
 * branch              1cdd9ccec4c9f05b616e7112059be4a9e358c571 -> FETCH_HEAD
Warning: you are leaving 3 commits behind, not connected to
any of your branches:

  3bfef71b Add RU864 LoRaWAN region
  309250ed Applying: patch utilities functions
  4794848f adapt to RIOT
gschorcht commented 5 years ago

The only way to change something in the source code of a package is to generate a patch file in pkg/semtech_lorwan/patches, see section Packages

yegorich commented 5 years ago

OK. I see. Btw adding a shell to lorawan example automagically solves my problem:

/* trigger the first send */
msg_t msg;
msg_send(&msg, sender_pid);
char line_buff[SHELL_DEFAULT_BUFSIZE];
shell_run(NULL, line_buff, SHELL_DEFAULT_BUFSIZE);
return 0;
gschorcht commented 5 years ago

Btw adding a shell to lorawan example automagically solves my problem

Confused :confused:

BTW, the threads that are running are exactly the same for both applications when LoRaMacInitialization is called.

tests/pkg_semtech_loramac:

    pid | name                 | state    Q | pri | stack  ( used) | base addr  | current     
      - | isr_stack            | -        - |   - |   2048 (    0) | 0x3ffb0220 | 0x3ffb0a20
      1 | idle                 | pending  Q |  31 |   2048 (  300) | 0x3ffb190c | 0x3ffb1fe0 
      2 | main                 | running  Q |  15 |   3072 ( 1228) | 0x3ffb210c | 0x3ffb2a40 
      3 | recv_thread          | bl rx    _ |  14 |   2048 (  492) | 0x3ffb2fec | 0x3ffb3600 
        | SUM                  |            |     |   9216 ( 2020)
examples/lorawan:

    pid | name                 | state    Q | pri | stack  ( used) | base addr  | current     
      - | isr_stack            | -        - |   - |   2048 (    0) | 0x3ffb0220 | 0x3ffb0a20
      1 | idle                 | pending  Q |  31 |   2048 (  292) | 0x3ffb1804 | 0x3ffb1ee0 
      2 | main                 | running  Q |  15 |   3072 ( 1124) | 0x3ffb2004 | 0x3ffb29a0 
      3 | recv_thread          | bl rx    _ |  14 |   2048 (  484) | 0x3ffb34e4 | 0x3ffb3b00 
        | SUM                  |            |     |   9216 ( 1900)
gschorcht commented 5 years ago

Btw adding a shell to lorawan example automagically solves my problem:

I thought that it might have to do with some module dependencies when module shell is added, but the only differences I see is the module itself :worried: So the question is, what makes the difference?

yegorich commented 5 years ago

This is something I would also like to know. This happens either only to me or T-Beam. @fcgdam has ported TTGO Lora V1 to RIOT (https://github.com/fcgdam/RIOT_TTGOESP32_LORA_V1) and doesn't seem to encounter this issue.

yegorich commented 5 years ago

To make the things even more confusing, just disabling DEVELHELP makes loarwan example run without issues:

make DEVELHELP=0 BOARD=esp32-ttgo-t-beam -C examples/lorawan/ flash term

fcgdam commented 5 years ago

Hi: I had issues, but not board resets. My issue was that the semtech_loramac_send blocked and never returned. I've checked and I could see that the TX DONE event was triggered but was never propagated to the calling thread. I've wasted a few days with it, but since before starting work I would do a git pull on the full RIOT repository, out of the blue one day started to work.

Edit: Remember it: The board definition for the SX1276 pins did change.

Can you try this definition: `//#define SX127X_PARAM_SPI (SPI_DEV(0)) // Yes, it is commented out

define SX127X_PARAM_SPI_NSS GPIO_PIN(0,18)

define SX127X_PARAM_SPI_RESET GPIO_PIN(0,23)

define SX127X_PARAM_DIO0 GPIO_PIN(0,26)

define SX127X_PARAM_DIO1 GPIO_UNDEF

define SX127X_PARAM_DIO2 GPIO_UNDEF

define SX127X_PARAM_DIO3 GPIO_UNDEF

define SX127X_PARAM_PASELECT (SX127X_PA_RFO)

define SX127X_PARAMS { /.spi = SX127X_PARAM_SPI,/ \

                      .nss_pin   = SX127X_PARAM_SPI_NSS, \
                      .reset_pin = SX127X_PARAM_RESET,   \
                      .dio0_pin  = SX127X_PARAM_DIO0,    \
                      .dio1_pin  = SX127X_PARAM_DIO1,    \
                      .dio2_pin  = SX127X_PARAM_DIO2,    \
                      .dio3_pin  = SX127X_PARAM_DIO3,    \
                      .paselect  = SX127X_PARAM_PASELECT \

}`

fcgdam commented 5 years ago

Hi: Can confirm. With my board definition and the latest pull from RIOT, the board also boot loops.

yegorich commented 5 years ago

Thanks for the test. Good to know, I am not the only one. Have you also tried to compile with DEVELHELP=0?

I have the same issue with an application reading GPS and trying to parse it. The basic structure is the same as tests/periph_uart but without a shell. As soon as I get \n, the buffer will be parsed in a printer thread using minmea. With DEVELHELP=1 the program provokes resets, with DEVELHELP=0 everything is working as expected.

So it is not LoRa related.

gschorcht commented 5 years ago

@fcgdam

With my board definition and the latest pull from RIOT, the board also boot loops.

Does this mean that it worked before but not anymore? Could you try it again with an older commit to be sure that it is not related to PR #10883?

git checkout 33a4b307a10b9cb456f85bf36ca440b5cf0ff44b
fcgdam commented 5 years ago

Hi! I was testing that right now! What a coincidence :) I was backtracking the commits to see what is the one that breaks things: So: 9aa8c619c1b984804b7fb8e27395b0965d1581d4 -> Works OK. (I was testing this random commit) 33a4b307a10b9cb456f85bf36ca440b5cf0ff44b -> Works OK.

fcgdam commented 5 years ago

With this commit: 43dedfcd2c52c15fb083e0d984c23c6f6e550cb6 it fails.

So the issue is somewhere behind this commit April the 18th.

Edit: Commit from 15 of April: 9aa8c619c1b984804b7fb8e27395b0965d1581d4 -> Works

fcgdam commented 5 years ago

Ok, so: The commit from 15th of April it works: 9aa8c619c1b984804b7fb8e27395b0965d1581d4 -> OK

The next commits that are even not related to the ESP fail... bbbb62b5718ce66bffdf384b3a0a3854341daab1 -> fails! Also 75d9e7892cd825ba1c50e20a68f96eda67b01b6b -> Fails! The next commit: 8c708110e53bbd5d292f9760df07e588f49866da -> Fails

And from this point everything fails until the HEAD.

Strange indeed!

gschorcht commented 5 years ago

Commit 9aa8c619c1b984804b7fb8e27395b0965d1581d4 is derived from a master before PR #10883 was merged. You would have to compare commit 33a4b307a10b9cb456f85bf36ca440b5cf0ff44b (the commit on which PR #10883 bases) with commit 35c617e08a25238d3c7affe3fe535c7ca2a7658d (the first commit after the merge of PR #10883).

If 33a4b307a10b9cb456f85bf36ca440b5cf0ff44b works but 35c617e08a25238d3c7affe3fe535c7ca2a7658d not, the problem might be (or better is :sunglasses:) caused by the refactoring of Xtensa vendor code with PR #10883 what would seem to be reasonable. Before PR #10883, esp8266 and esp32 were using their own implementations. Now, the implementations are merged to esp_common.

fcgdam commented 5 years ago

Hi:

Yes: Commit 33a4b307a10b9cb456f85bf36ca440b5cf0ff44b works, but 35c617e08a25238d3c7affe3fe535c7ca2a7658d does not.

gschorcht commented 5 years ago

Thank you a lot for testing :smile: Even though these are really bad news, they make sense.

I will have to compare line by line what the differences are. Unfortunatly, I made these changes almmost a half year ago and I don't remember exactly how big the differences in case of esp32 are. As I remember, I had to change the esp8266 code much more than the esp32 code.

@yegorich Unfortunately, I'm quite busy with my business and don't know when I will have enough time to check the code. Would it be a big problem if it takes 2 or 3 weeks. I will do it as soon as I can.

gschorcht commented 5 years ago

I could figure out the cause of the problem. With commit 28d9599d52cf07d7282da1931cb240c027fea1f6, I introduced the following "fix", at least I thought it would be a fix.

--- a/cpu/esp32/thread_arch.c
+++ b/cpu/esp32/thread_arch.c
@@ -186,1 +186,1 @@ char* thread_stack_init(thread_task_func_t task_func, void *arg, void *stack_sta
-    p = (uint32_t *)(((uint32_t) top_of_stack+1 - XT_CP_SIZE));
+    p = (uint32_t *)(((uint32_t)(top_of_stack + 1) - XT_CP_SIZE) & ~0xf);

But in fact, it was a bug :dark_sunglasses: Since top_of_stack isn't aligned to 16 byte address and top_of_stack as well ass XT_CP_SIZE are used unaligned in cpu/esp_common/vendor/xtensa/portasm.S, this alignment down to the next 16 byte address stores a wrong pointer to the coprocessor save area in the initial thread context :grimacing:

Reverting the change seems to work. Even though I have no working LoRaWAN gateway, I could see the following outputs with my Heltec WiFi LoRa 32 V2.

Starting RIOT kernel on PRO cpu
I (247) [main_trampoline]: main(): This is RIOT! (Version: 2019.07-devel-141-g203c-boards/esp32-heltec-lora32)
LoRaWAN Class A low-power application
=====================================
[semtech-loramac] initializing loramac
Starting join procedure
[semtech-loramac] Starting join procedure: 0
[semtech-loramac] loramac cmd msg
[semtech-loramac] starting OTAA join
[semtech-loramac] Transmission completed
[semtech-loramac] MAC timer timeout
[semtech-loramac] MAC timer timeout
[semtech-loramac] MAC timer timeout
[semtech-loramac] MAC timer timeout
[semtech-loramac] MAC timer timeout
[semtech-loramac] MAC timer timeout
[semtech-loramac] MAC timer timeout
[semtech-loramac] MAC timer timeout
[semtech-loramac] MAC timer timeout
[semtech-loramac] MAC timer timeout
[semtech-loramac] RX timer timeout
[semtech-loramac] MAC timer timeout
[semtech-loramac] MLME confirm event
[semtech-loramac] join not successful
[semtech-loramac] loramac join notification msg
Join procedure failed
gschorcht commented 5 years ago

I will provide a PR which fixes the problem. BTW, esp8266 has same problem but since it doesn't have a coprocessor it doesn't matter there.

fcgdam commented 5 years ago

Hi thanks:.

No issue whatsoever waiting for a fix for me. I'm also quite busy with other things :)

Not sure about @yegorich.

yegorich commented 5 years ago

@gschorcht great news! That was fast. I can test your fix next week as soon as I'm in my office. Btw. I'm using N-Fuse card for LoRaWAN tests.

@fcgdam have you thought about upstreaming your board definition?

fcgdam commented 5 years ago

Hi!

Did the change manually (removed the & ~0xf) and it works now! So the issue seems to be this. At home I only have a single channel gateway, so I use that to do some basic tests, it seems to work now. For full tests only at my office (I also have a N-Fuse based gateway).

@yegorich Regarding up-streaming my board definition not quite sure if its at a good RIOT-OS standard (Missing Arduino pins, and so on), so that was the main reason why I didn't create a PR, and so let users find it out the board definition by themselves :)

yegorich commented 5 years ago

@fcgdam I think Adruino pins are optional. I'm going to drop them. If someone really needs this functionality, it can be added later.

Btw. you don't have to specify #define SX127X_PARAMS as they are defined in drivers/sx127x/include/sx127x_params.h.