RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.84k stars 1.97k forks source link

STM32F4-discovery + mrf24j40 not working #19711

Open gustavowd opened 1 year ago

gustavowd commented 1 year ago

Description

Example: Cannot run gnrc_networking application for stm32f4-discovery board + mrf24j40 driver.

Steps to reproduce the issue

Basically compile and flash the example to the board.

Expected results

The gnrc_networking application runs and create a new rpl dodag

Actual results

The gnrc_networking application runs, but there is error in the rpl network.

Versions

Operating system: Ubuntu 23.04 Build environment: Operating System Environment

     Operating System: "Ubuntu" "23.04 (Lunar Lobster)"
               Kernel: Linux 6.2.0-20-generic x86_64 x86_64
         System shell: /usr/bin/dash (probably dash)
         make's shell: /usr/bin/dash (probably dash)

Installed compiler toolchains

           native gcc: gcc (Ubuntu 12.2.0-17ubuntu1) 12.2.0
    arm-none-eabi-gcc: arm-none-eabi-gcc (15:12.2.rel1-1) 12.2.1 20221205
              avr-gcc: missing
       msp430-elf-gcc: missing
   riscv-none-elf-gcc: missing

riscv64-unknown-elf-gcc: missing riscv-none-embed-gcc: missing riscv32-esp-elf-gcc: missing xtensa-esp32-elf-gcc: missing xtensa-esp32s2-elf-gcc: missing xtensa-esp32s3-elf-gcc: missing xtensa-esp8266-elf-gcc: missing clang: missing

Installed compiler libs

 arm-none-eabi-newlib: "3.3.0"
    msp430-elf-newlib: missing
riscv-none-elf-newlib: missing

riscv64-unknown-elf-newlib: missing riscv-none-embed-newlib: missing riscv32-esp-elf-newlib: missing xtensa-esp32-elf-newlib: missing xtensa-esp32s2-elf-newlib: missing xtensa-esp32s3-elf-newlib: missing xtensa-esp8266-elf-newlib: missing avr-libc: missing (missing)

Installed development tools

               ccache: ccache version 4.7.4
                cmake: cmake version 3.25.1
             cppcheck: missing
              doxygen: missing
                  git: git version 2.39.2
                 make: GNU Make 4.3
              openocd: Open On-Chip Debugger 0.12.0
               python: Python 3.11.2
              python2: missing
              python3: Python 3.11.2
               flake8: error: /usr/bin/python3: No module named flake8
           coccinelle: missing
gustavowd commented 1 year ago

mrf24j401 mrf24j402 mrf24j403

gustavowd commented 1 year ago

mrf24j40 driver seems to work normally, but can´t send any packet from it.

maribu commented 1 year ago

Thx for bug report.

Are you able to ping (or even txtsend) other nodes? I wonder, if this is indeed an issue with the network device driver or rather an issue independent of the driver.

gustavowd commented 1 year ago

Thx for bug report.

Are you able to ping (or even txtsend) other nodes? I wonder, if this is indeed an issue with the network device driver or rather an issue independent of the driver.

I finally got it. I have attached the WAKE pin to the VCC and everything start to work. Next week I will test more nodes to make sure it's working. mrf24j40

maribu commented 1 year ago

Thanks again for the bug report. Documentation was added so that other uses don't run into the same issue. Hopefully someone will find time to also actually extend the driver to handle the wake pin (if specified != GPIO_UNDEF), but at least this foot gun is now easy to spot in the documentation.

If you run in other issues, please report :) We are happy to help.

Carton32 commented 4 months ago

Actually, I can still reproduce this bug. I moved from 2022.07 (no issue) to 2023.04 (issue appeared). I've also tested with gnrc_networking on master and the bug is still here. I've always set the GPIO handling the WAKE pin in my board although the pin is disabled by default in MRF24J40 (see p.28 of datasheet) and I see nothing in the driver enabling the pin.

In my case, the spam of message 4659 that @gustavowd is reporting occurs every 2 boots. I can boot 1 time and my board can send and receive pings and after reboot command, the board spams these messages and can't communicate anymore. The order in which the bug appears is fixed (1 time every 2 reboot), even after flashing the MCU again.

maribu commented 4 months ago

I sadly do not have an mrf24j40 available to reproduce. Would you mind to git bisect to identify which commit introduced the issue?

Carton32 commented 4 months ago
$ git bisect good
91a299cb7de2d0646fee6fc52bd4f1a0c06308c0 is the first bad commit
commit 91a299cb7de2d0646fee6fc52bd4f1a0c06308c0
Author: Jose Alamos <jose@alamos.cc>
Date:   Tue Aug 16 17:28:05 2022 +0200

    drivers/mrf24j40: add support for IEEE 802.15.4 Radio HAL

is the first commit with these kind of errors like below and as shown by @gustavowd :

2024-02-29 14:44:41,521 # gnrc_netif: message 4659
2024-02-29 14:44:41,524 # gnrc_netif: send from packet send queue
2024-02-29 14:44:41,529 # gnrc_netif: error sending packet 0x20001908 (code: -16)
2024-02-29 14:44:41,532 # gnrc_netif: (re-)queued pkt 0x20001908
2024-02-29 14:44:41,535 # gnrc_netif: handling events
2024-02-29 14:44:41,538 # gnrc_netif: waiting for events
2024-02-29 14:44:41,540 # gnrc_netif: handling events
2024-02-29 14:44:41,542 # gnrc_netif: message 4659
2024-02-29 14:44:41,546 # gnrc_netif: send from packet send queue
2024-02-29 14:44:41,551 # gnrc_netif: error sending packet 0x20001908 (code: -16)
2024-02-29 14:44:41,554 # gnrc_netif: (re-)queued pkt 0x20001908
2024-02-29 14:44:41,557 # gnrc_netif: handling events
2024-02-29 14:44:41,559 # gnrc_netif: waiting for events
2024-02-29 14:44:41,562 # gnrc_netif: handling events
2024-02-29 14:44:41,564 # gnrc_netif: message 4659
2024-02-29 14:44:41,568 # gnrc_netif: send from packet send queue

During my git bisect procedure, I encountered other issues (like shell not working) but not related to this issue.

EDIT: After more extensive testing, I can confirm that dacc4ff1248ee36d6a373ffcb140cd9787f7c56e is OK and 91a299cb7de2d0646fee6fc52bd4f1a0c06308c0 is the first commit to spam the messages above. The other issues I was mentioning at the end of my message (like shell not working) was provoked by my SPI clock frequency being too low (https://github.com/RIOT-OS/RIOT/issues/7828).

Carton32 commented 4 months ago
2024-02-29 16:18:52,498 # reboot
2024-02-29 16:18:52,544 # NETOPT_TX_END_IRQ not implemented by driver
2024-02-29 16:18:52,552 # main(): This is RIOT! (Version: 2023.04-devel-93-g91a299-HEAD)
2024-02-29 16:18:52,555 # RIOT network stack example application
2024-02-29 16:18:52,558 # All up, running the shell now

2024-02-29 16:18:53,512 # ping ff02::01 -c 1
2024-02-29 16:18:54,512 # 
2024-02-29 16:18:54,515 # --- ff02::01 PING statistics ---
2024-02-29 16:18:54,521 # 1 packets transmitted, 0 packets received, 100% packet loss

2024-02-29 16:18:57,509 # reboot
2024-02-29 16:18:57,555 # NETOPT_TX_END_IRQ not implemented by driver
2024-02-29 16:18:57,564 # main(): This is RIOT! (Version: 2023.04-devel-93-g91a299-HEAD)
2024-02-29 16:18:57,568 # RIOT network stack example application
2024-02-29 16:18:57,570 # All up, running the shell now

2024-02-29 16:18:58,725 # ping ff02::01 -c 1
2024-02-29 16:18:58,753 # 12 bytes from fe80::66c3:56a5:3ab1:a790%6: icmp_seq=0 ttl=64 rssi=45 dBm time=20.310 ms
2024-02-29 16:18:58,753 # 
2024-02-29 16:18:58,756 # --- ff02::01 PING statistics ---
2024-02-29 16:18:58,761 # 1 packets transmitted, 1 packets received, 0% packet loss
2024-02-29 16:18:58,765 # round-trip min/avg/max = 20.310/20.310/20.310 ms

Example of results I have (and this cycles indefinitely). If I had set ENABLE_DEBUG to 1 in gnrc_netif.c, I would have had a flood of "message 4659" errors in the console output when the module is no longer able to send or receive messages (after first reboot shown above).

jia200x commented 4 months ago

@Carton32 could you enable debug on sys/net/gnrc/netif/ieee802154/gnrc_netif_ieee802154.c?

jia200x commented 4 months ago

amd could you also apply this patch to 91a299cb7de2d0646fee6fc52bd4f1a0c06308c0 ?

diff --git a/sys/net/link_layer/ieee802154/submac.c b/sys/net/link_layer/ieee802154/submac.c
index 471bc0c32d..548f53e61f 100644
--- a/sys/net/link_layer/ieee802154/submac.c
+++ b/sys/net/link_layer/ieee802154/submac.c
@@ -408,6 +408,7 @@ int ieee802154_send(ieee802154_submac_t *submac, const iolist_t *iolist)
     ieee802154_fsm_state_t current_state = submac->fsm_state;

     if (current_state != IEEE802154_FSM_STATE_RX && current_state != IEEE802154_FSM_STATE_IDLE) {
+        puts("1");
         return -EBUSY;
     }

@@ -426,6 +427,7 @@ int ieee802154_send(ieee802154_submac_t *submac, const iolist_t *iolist)

     if (ieee802154_submac_process_ev(submac, IEEE802154_FSM_EV_REQUEST_TX)
         != IEEE802154_FSM_STATE_PREPARE) {
+        puts("2");
         return -EBUSY;
     }
     return 0;

and post back the results?

Carton32 commented 4 months ago

This is what I get on the revision 2 of my custom board:

2024-03-04 07:16:19,098 # Connect to serial port /dev/ttyUSB2
Welcome to pyterm!
Type '/exit' to exit.
2024-03-04 07:16:21,298 # _recv_ieee802154: received packet from 64:C3:56:A5:3A:B1:A7:90 of length 43
2024-03-04 07:16:21,305 # 00000000  41  C8  36  09  00  FF  FF  90  A7  B1  3A  A5  56  C3  64  7B
2024-03-04 07:16:21,311 # 00000010  3B  3A  02  85  00  3E  D0  00  00  00  00  01  02  64  C3  56
2024-03-04 07:16:21,316 # 00000020  A5  3A  B1  A7  90  00  00  00  00  00  00
2024-03-04 07:16:21,321 # _recv_ieee802154: reallocating MAC payload for upper layer.
reboot
2024-03-04 07:16:25,248 # reboot
2024-03-04 07:16:25,294 # NETOPT_TX_END_IRQ not implemented by driver
2024-03-04 07:16:25,296 # 1
2024-03-04 07:16:25,302 # main(): This is RIOT! (Version: 2023.04-devel-93-g91a299-H1
2024-03-04 07:16:25,302 # EAD)
2024-03-04 07:16:25,306 # RIOT network stack example application
2024-03-04 07:16:25,307 # All up, runnin1
2024-03-04 07:16:25,308 # g the shell now
> 2024-03-04 07:16:25,312 # 1
2024-03-04 07:16:25,317 # 1
2024-03-04 07:16:25,323 # 1
2024-03-04 07:16:25,328 # 1
2024-03-04 07:16:25,333 # 1
2024-03-04 07:16:25,338 # 1
2024-03-04 07:16:25,344 # 1
2024-03-04 07:16:25,349 # 1
2024-03-04 07:16:25,354 # 1
2024-03-04 07:16:25,359 # 1
2024-03-04 07:16:25,364 # 1
2024-03-04 07:16:25,370 # 1
2024-03-04 07:16:25,375 # 1
2024-03-04 07:16:25,380 # 1
2024-03-04 07:16:25,385 # 1
2024-03-04 07:16:25,391 # 1

The repetition of "1" goes on indefinitely but shell is still responding.

I've done the same test on the revision 1 of this custom board and the bug does not occur.

The differences in passive components around the MRF24J40MD are: Rev1:

Rev2:

I've tired on Rev2:

It's definitely hardware related but not only as the commits before 91a299cb7de2d0646fee6fc52bd4f1a0c06308c0 never triggered the infinite busy loop on both hardware revisions.

jia200x commented 4 months ago

hi @Carton32,

from what I can see this is likely a hardware issue which could be triggered by missing an IRQ. It is a bit hard for me to trace back the exact cause because there is a new implementation of the SubMAC nowadays.

Please mind that commit 91a299cb7de2d0646fee6fc52bd4f1a0c06308c0 introduces a new SubMAC layer for that driver, which expects a completely different behavior from the device. So, it is not possible to compare 91a299cb7de2d0646fee6fc52bd4f1a0c06308c0 with the previous commits because it is essentially comparing two different things. The Radio HAL + SubMAC implementations are more strict with validation, which exposes bugs that wouldn't have been detected with the old interface.

Therefore, if Rev2 works as expected with the right configuration, I would just assume it is a hardware thing that got catched by the SubMAC.

Carton32 commented 4 months ago

Hi @jia200x,

Yes. I'm aware of these big changes and it was no surprise to me that my git bisect procedure lead me to this precise commit. What I wanted to mean is that I have 4-5000 of Rev2 boards running on RIOT OS 2022.07 (so prior to the HAL + SubMAC introduction) and they are working fine since almost 2 years for the most recent ones.

I think we can close the issue. I will try to find a workaround on my side.

Thanks for your help!

jia200x commented 4 months ago

Yes. I'm aware of these big changes and it was no surprise to me that my git bisect procedure lead me to this precise commit. What I wanted to mean is that I have 4-5000 of Rev2 boards running on RIOT OS 2022.07 (so prior to the HAL + SubMAC introduction) and they are working fine since almost 2 years for the most recent ones.

We got in the past some cases of working implementations that had quirks when doing very specific stuff. These usually went undetected for a long time, unless you had the bad luck of triggering some. So keep in mind that some of the issues could still be there, although they are not triggered that easily.

Could you try reproducing the same bug with the newest SubMAC implementation in master? That should give more information whether there's an invalid state transition of there's a missing interrupt. I prepared this patch on top of 8b832804e8fa414f541bfcf438bd3f3cd811d127 that could give some insights:

diff --git a/sys/net/link_layer/ieee802154/submac.c b/sys/net/link_layer/ieee802154/submac.c
index 471bc0c32d..6749ae9feb 100644
--- a/sys/net/link_layer/ieee802154/submac.c
+++ b/sys/net/link_layer/ieee802154/submac.c
@@ -81,6 +81,7 @@ static ieee802154_fsm_state_t _tx_end(ieee802154_submac_t *submac, int status,

     assert(res >= 0);
     submac->cb->tx_done(submac, status, info);
+    puts("D");
     return IEEE802154_FSM_STATE_IDLE;
 }

@@ -120,6 +121,7 @@ static int _handle_fsm_ev_request_tx(ieee802154_submac_t *submac)
     int res = ieee802154_radio_set_idle(dev, false);

     if (res < 0) {
+        puts("1");
         return res;
     }
     else {
@@ -149,6 +151,7 @@ static ieee802154_fsm_state_t _fsm_state_rx(ieee802154_submac_t *submac, ieee802
         while (ieee802154_radio_set_idle(&submac->dev, false) < 0) {}
         if (ieee802154_radio_len(&submac->dev) > (int)IEEE802154_MIN_FRAME_LEN) {
             submac->cb->rx_done(submac);
+            puts("3");
             return IEEE802154_FSM_STATE_IDLE;
         }
         else {
Carton32 commented 4 months ago

I can reproduce on 8b832804e8fa414f541bfcf438bd3f3cd811d127.

Here's the output with the patch applied:

2024-03-04 12:43:10,423 # reboot
2024-03-04 12:43:10,470 # NETOPT_TX_END_IRQ not implemented by driver
2024-03-04 12:43:10,474 # main(): This is RIOT! (VerD
2024-03-04 12:43:10,478 # sion: 2024.04-devel-287-g8b832-HEAD)
2024-03-04 12:43:10,479 # RIOT netwoD
2024-03-04 12:43:10,482 # rk stack example application
2024-03-04 12:43:10,485 # All up, running the shell now
> 2024-03-04 12:43:10,827 # D
2024-03-04 12:43:20,826 # D
2024-03-04 12:43:30,827 # D
2024-03-04 12:43:44,826 # D
2024-03-04 12:43:54,520 # 3
2024-03-04 12:43:54,530 # D
2024-03-04 12:43:54,721 # 3
2024-03-04 12:43:54,729 # D
2024-03-04 12:43:54,922 # 3
2024-03-04 12:43:54,932 # D
reboot
2024-03-04 12:44:01,230 # reboot
2024-03-04 12:44:01,277 # NETOPT_TX_END_IRQ not implemented by driver
2024-03-04 12:44:01,284 # main(): This is RIOT! (Version: 2024.04-devel-287-g8b832-HEAD)
2024-03-04 12:44:01,287 # RIOT network stack example application
2024-03-04 12:44:01,290 # All up, running the shell now
> 

First reboot, the module can receive/emit as expected. I have some '3' and 'D' in the output as shown. After the second reboot, the module cannot emit ping. The console output stays blank even after minutes of observation. As a small reminder, the bug occurs every 2 reboot command. I've now tried to make hard reboots (unplugging the device from power source during 5s) and it shows the exact same behavior. The bug occurs every 2 hard reboots.

EDIT: When the board can't send ping, I get these messages after ~10 minutes:

2024-03-04 14:09:34,390 # reboot
2024-03-04 14:09:34,436 # NETOPT_TX_END_IRQ not implemented by driver
2024-03-04 14:09:34,443 # main(): This is RIOT! (Version: 2024.04-devel-287-g8b832-HEAD)
2024-03-04 14:09:34,447 # RIOT network stack example application
2024-03-04 14:09:34,450 # All up, running the shell now
> 2024-03-04 14:20:34,799 # gnrc_netif: can't queue packet for sending
2024-03-04 14:20:34,803 # gnrc_netif: can't queue packet for sending
2024-03-04 14:21:34,800 # gnrc_netif: can't queue packet for sending
2024-03-04 14:21:34,804 # gnrc_netif: can't queue packet for sending
Carton32 commented 4 months ago

If I set ENABLE_DEBUG to 1 in drivers/mrf24j40/mrf24j40_radio_hal.c, I get no output from mrf24j40_radio_irq_handler() (the only function containing DEBUG() macros in this file) if the board can't communicate. After a reboot, when the board can communicate, I get these DEBUG() printed in my console output as we would expect.

reboot
2024-03-04 14:55:01,415 # reboot
2024-03-04 14:55:01,462 # NETOPT_TX_END_IRQ not implemented by driver
2024-03-04 14:55:01,469 # main(): This is RIOT! (Version: 2024.04-devel-287-g8b832-HEAD)
2024-03-04 14:55:01,472 # RIOT network stack example application
2024-03-04 14:55:01,475 # All up, running the shell now
> reboot
2024-03-04 14:55:05,729 # reboot
2024-03-04 14:55:05,775 # NETOPT_TX_END_IRQ not implemented by driver
2024-03-04 14:55:05,783 # main(): This is RIOT! (Version: 2024.[mrf24j40] INTERRUPT (pending: 3),
2024-03-04 14:55:05,785 # [mrf24j40] END IRQ
2024-03-04 14:55:05,785 # D
2024-03-04 14:55:05,788 # 04-devel-287-g8b832-HEAD)
2024-03-04 14:55:05,792 # RIOT n[mrf24j40] INTERRUPT (pending: 3),
2024-03-04 14:55:05,794 # [mrf24j40] END IRQ
2024-03-04 14:55:05,794 # D
2024-03-04 14:55:05,795 # e40] END IRQ
2024-03-04 14:55:05,797 # T (pending: 3),
2024-03-04 14:55:05,797 # ion
2024-03-04 14:55:05,800 # All up, running the shell now
> 2024-03-04 14:55:06,133 # [mrf24j40] INTERRUPT (pending: 3),
2024-03-04 14:55:06,135 # [mrf24j40] END IRQ
2024-03-04 14:55:06,135 # D
ping ff02::01 -c 1
2024-03-04 14:55:08,047 # ping ff02::01 -c 1
2024-03-04 14:55:08,056 # [mrf24j40] INTERRUPT (pending: 3),
2024-03-04 14:55:08,057 # [mrf24j40] END IRQ
2024-03-04 14:55:08,058 # D
2024-03-04 14:55:08,063 # [mrf24j40] INTERRUPT (pending: 5),
2024-03-04 14:55:08,065 # [mrf24j40] EVT - RX_END
2024-03-04 14:55:08,067 # [mrf24j40] END IRQ
2024-03-04 14:55:08,067 # 3
2024-03-04 14:55:08,076 # 12 bytes from fe80::66c3:56a5:3ab1:a790%6: icmp_seq=0 ttl=64 rssi=45 dBm time=20.595 ms
2024-03-04 14:55:08,076 # 
2024-03-04 14:55:08,079 # --- ff02::01 PING statistics ---
2024-03-04 14:55:08,084 # 1 packets transmitted, 1 packets received, 0% packet loss
2024-03-04 14:55:08,088 # round-trip min/avg/max = 20.595/20.595/20.595 ms
Carton32 commented 4 months ago

Something interesting. If I maintain the MRF24J40 in a reset state with its reset pin held to low during initialization, the bug disappear.

diff --git a/drivers/mrf24j40/mrf24j40_radio_hal.c b/drivers/mrf24j40/mrf24j40_radio_hal.c
index ee01f9c68a..a9273f2796 100644
--- a/drivers/mrf24j40/mrf24j40_radio_hal.c
+++ b/drivers/mrf24j40/mrf24j40_radio_hal.c
@@ -72,7 +72,7 @@ int mrf24j40_init(mrf24j40_t *dev, const mrf24j40_params_t *params, ieee802154_d
     /* initialize GPIOs */
     spi_init_cs(dev->params->spi, dev->params->cs_pin);
     gpio_init(dev->params->reset_pin, GPIO_OUT);
-    gpio_set(dev->params->reset_pin);
+    gpio_clear(dev->params->reset_pin);
     gpio_init_int(dev->params->int_pin, GPIO_IN, GPIO_RISING, cb, ctx);

     /* reset device to default values */
jia200x commented 4 months ago

From the logs it looks like as if the device was not generating more IRQ. In case of the netdev variants (before https://github.com/RIOT-OS/RIOT/commit/91a299cb7de2d0646fee6fc52bd4f1a0c06308c0) the TX_DONE event was not used as a feedback mechanism. Therefore, it would be possible to transmit again even if you didn't receive the interrupt. Could you verify whether you get IRQs with the old version?

But in any case, it could be that the initialization / reset was not done properly (as you describe in https://github.com/RIOT-OS/RIOT/issues/19711#issuecomment-1978264550). Does this completely remove the bug?

Carton32 commented 4 months ago

Could you verify whether you get IRQs with the old version?

This is what I get on dacc4ff1248ee36d6a373ffcb140cd9787f7c56e with ENABLE_DEBUG set to 1 in drivers/mrf24j40/mrf24j40_netdev.c:

reboot
2024-03-05 16:02:22,562 # reboot
2024-03-05 16:02:22,593 # [mrf24j40] INTERRUPT (pending: 1),
2024-03-05 16:02:22,595 # [mrf24j40] END IRQ
2024-03-05 16:02:22,602 # [mrf24j40] INTERRUPT (pending: 2),
2024-03-05 16:02:22,605 # [mrf24j40] EVT - TX_END
2024-03-05 16:02:22,606 # [mrf24j40] END IRQ
2024-03-05 16:02:22,609 # [mrf24j40] INTERRUPT (pending: 3),
2024-03-05 16:02:22,611 # [mrf24j40] EVT - TX_END
2024-03-05 16:02:22,613 # [mrf24j40] END IRQ
2024-03-05 16:02:22,619 # main(): This is RIOT! (Version: 2023.04-devel-92-gdacc4f-HEAD)
2024-03-05 16:02:22,622 # RIOT network stack example application
2024-03-05 16:02:22,625 # All up, running the shell now
> 2024-03-05 16:02:23,008 # [mrf24j40] INTERRUPT (pending: 3),
2024-03-05 16:02:23,010 # [mrf24j40] EVT - TX_END
2024-03-05 16:02:23,012 # [mrf24j40] END IRQ
ping ff02::01 -c 1
2024-03-05 16:02:25,709 # ping ff02::01 -c 1
2024-03-05 16:02:25,716 # [mrf24j40] INTERRUPT (pending: 3),
2024-03-05 16:02:25,718 # [mrf24j40] EVT - TX_END
2024-03-05 16:02:25,720 # [mrf24j40] END IRQ
2024-03-05 16:02:25,723 # [mrf24j40] INTERRUPT (pending: 5),
2024-03-05 16:02:25,725 # [mrf24j40] EVT - RX_END
2024-03-05 16:02:25,728 # [mrf24j40] END IRQ
2024-03-05 16:02:25,738 # [mrf24j40] INTERRUPT (pending: 3),
2024-03-05 16:02:25,740 # [mrf24j40] EVT - TX_END
2024-03-05 16:02:25,742 # [mrf24j40] END IRQ
2024-03-05 16:02:25,745 # [mrf24j40] INTERRUPT (pending: 5),
2024-03-05 16:02:25,747 # [mrf24j40] EVT - RX_END
2024-03-05 16:02:25,750 # [mrf24j40] END IRQ
2024-03-05 16:02:25,758 # 12 bytes from fe80::66df:bfec:c684:8d21%6: icmp_seq=0 ttl=64 rssi=-35 dBm time=40.993 ms
2024-03-05 16:02:25,758 # 
2024-03-05 16:02:25,761 # --- ff02::01 PING statistics ---
2024-03-05 16:02:25,766 # 1 packets transmitted, 1 packets received, 0% packet loss
2024-03-05 16:02:25,771 # round-trip min/avg/max = 40.993/40.993/40.993 ms
> reboot
2024-03-05 16:02:28,258 # reboot
2024-03-05 16:02:28,289 # [mrf24j40] INTERRUPT (pending: 1),
2024-03-05 16:02:28,291 # [mrf24j40] END IRQ
2024-03-05 16:02:28,299 # [mrf24j40] INTERRUPT (pending: 2),
2024-03-05 16:02:28,301 # [mrf24j40] EVT - TX_END
2024-03-05 16:02:28,303 # [mrf24j40] END IRQ
2024-03-05 16:02:28,306 # [mrf24j40] INTERRUPT (pending: 3),
2024-03-05 16:02:28,308 # [mrf24j40] EVT - TX_END
2024-03-05 16:02:28,310 # [mrf24j40] END IRQ
2024-03-05 16:02:28,315 # main(): This is RIOT! (Version: 2023.04-devel-92-gdacc4f-HEAD)
2024-03-05 16:02:28,319 # RIOT network stack example application
2024-03-05 16:02:28,321 # All up, running the shell now
> 2024-03-05 16:02:28,704 # [mrf24j40] INTERRUPT (pending: 3),
2024-03-05 16:02:28,707 # [mrf24j40] EVT - TX_END
2024-03-05 16:02:28,708 # [mrf24j40] END IRQ
ping ff02::01 -c 12024-03-05 16:02:30,597 # [mrf24j40] INTERRUPT (pending: 5),
2024-03-05 16:02:30,599 # [mrf24j40] EVT - RX_END
2024-03-05 16:02:30,602 # [mrf24j40] END IRQ

2024-03-05 16:02:33,344 # ping ff02::01 -c 1
2024-03-05 16:02:33,350 # [mrf24j40] INTERRUPT (pending: 3),
2024-03-05 16:02:33,352 # [mrf24j40] EVT - TX_END
2024-03-05 16:02:33,354 # [mrf24j40] END IRQ
2024-03-05 16:02:33,357 # [mrf24j40] INTERRUPT (pending: 5),
2024-03-05 16:02:33,359 # [mrf24j40] EVT - RX_END
2024-03-05 16:02:33,362 # [mrf24j40] END IRQ
2024-03-05 16:02:33,370 # 12 bytes from fe80::66df:bfec:c684:8d21%6: icmp_seq=0 ttl=64 rssi=-35 dBm time=18.367 ms
2024-03-05 16:02:33,370 # 
2024-03-05 16:02:33,373 # --- ff02::01 PING statistics ---
2024-03-05 16:02:33,378 # 1 packets transmitted, 1 packets received, 0% packet loss
2024-03-05 16:02:33,383 # round-trip min/avg/max = 18.367/18.367/18.367 ms

Interrupts work even after several "pairs of reboots".

Does this completely remove the bug?

Yes. I couldn't reproduce the bug a single time with this patch. The procedure of initialization of MRF24J40 devices doesn't seem to have changed since the introduction of the new HAL/SubMAC implementation. At least for GPIO initialization and registers configuration.

EDIT: The patch is not enough if I unplug my STLinkV2 + the UART-USB adapter. The radio doesn't work after all hard boots. Only some of them. If I keep the patch applied and I comment out the

/* Set device to SLEEP */
    // mrf24j40_sleep(dev);

at the end of mrf24j40_init(). It's working again flawlessly after each boot. Let's see if after a huge number of repetition, it keeps working.