RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.89k stars 1.98k forks source link

app/netdev: application stops working after receiving frames with assertion or completely without error #8271

Open roberthartung opened 6 years ago

roberthartung commented 6 years ago

Description

Application stops working (Assertion failed or no output) after random(?) time with assertion with my at86rf2xx, and no error on telosb (tmote sky).

Steps to reproduce the issue

Send packets from various nodes (broadcast) to other nodes. Also, corrupt packages should be in the air.

Expected results

Application correctly receives and frees all packets.

Actual results

2017-12-18 09:05:32,868 - INFO # sys/net/gnrc/pktbuf_static/gnrc_pktbufstatic.c:230 => n/a 2017-12-18 09:05:32,868 - INFO # *** RIOT kernel panic: 2017-12-18 09:05:32,886 - INFO # FAILED ASSERTION. 2017-12-18 09:05:32,887 - INFO # 2017-12-18 09:05:32,932 - INFO # pid | name | state Q | pri | stack ( used) | base addr | current
2017-12-18 09:05:32,980 - INFO # 1 | idle | pending Q | 15 | 768 ( 88) | 0x1eb7 | 0x2174 2017-12-18 09:05:33,028 - INFO # 2 | main | bl mutex
| 7 | 768 ( 246) | 0x21b7 | 0x23c7 2017-12-18 09:05:33,076 - INFO # 3 | at86rf2xx | bl rx _ | 2 | 512 ( 242) | 0x3de3 | 0x3f44 2017-12-18 09:05:33,124 - INFO # 4 | dump_thread | running Q | 8 | 768 ( 766) | 0x1b49 | 0x1dea 2017-12-18 09:05:33,171 - INFO # 5 | sendthread | bl mutex | 9 | 768 ( 170) | 0x1809 | 0x1a6d 2017-12-18 09:05:33,204 - INFO # | SUM | | | 3584 ( 1512) 2017-12-18 09:05:33,205 - INFO # 2017-12-18 09:05:33,205 - INFO # *** halted. 2017-12-18 09:05:33,219 - INFO #

miri64 commented 6 years ago

From what branch are you building? When I try the application you provided on users, I get an error:

main.c:66:67: error: 'gnrc_netif_hdr_t {aka struct <anonymous>}' has no member named 'crc_valid'
                 printf("HDR: %d %d %d\n", hdr->rssi, hdr->lqi, hdr->crc_valid);
                                                                   ^~
roberthartung commented 6 years ago

Working on https://github.com/roberthartung/RIOT/tree/wip-inga-crc , see also: #8276

roberthartung commented 6 years ago

One of my telosb nodes stopped working this morning again:

2017-12-19 02:38:21,857 - INFO # send... OK
2017-12-19 02:38:22,399 - INFO # HDR: -68 107 1
2017-12-19 02:38:22,495 - INFO # HDR: -50 107 1
2017-12-19 02:38:22,527 - INFO # HDR: -59 95 1
2017-12-19 02:38:22,560 - INFO # HDR: -39 108 1
2017-12-19 02:38:22,590 - INFO # HDR: -52 107 1
2017-12-19 02:38:23,420 - INFO # HDR: -68 107 1
2017-12-19 02:38:23,500 - INFO # HDR: -50 108 1
2017-12-19 02:38:23,547 - INFO # HDR: -58 94 1
2017-12-19 02:38:23,564 - INFO # HDR: -39 108 1
2017-12-19 02:38:23,595 - INFO # HDR: -52 108 1
2017-12-19 02:38:23,675 - INFO # HDR: -54 108 1
2017-12-19 02:38:23,898 - INFO # send... OK
2017-12-19 02:38:24,456 - INFO # HDR: -68 108 1
2017-12-19 02:38:24,505 - INFO # HDR: -47 106 1
2017-12-19 02:38:24,569 - INFO # HDR: -58 89 1
2017-12-19 02:38:24,600 - INFO # HDR: -39 107 1
2017-12-19 02:38:24,601 - INFO # HDR: -52 108 1
2017-12-19 02:38:25,477 - INFO # HDR: -68 107 1
2017-12-19 02:38:25,509 - INFO # HDR: -47 107 1
2017-12-19 02:38:25,588 - INFO # HDR: -58 98 1
2017-12-19 02:38:25,604 - INFO # HDR: -50 107 1
2017-12-19 02:38:25,620 - INFO # HDR: -39 107 1
2017-12-19 02:38:25,699 - INFO # HDR: -54 107 1
2017-12-19 02:38:25,938 - INFO # send... OK
2017-12-19 02:38:26,498 - INFO # HDR: -68 106 1
2017-12-19 02:38:26,513 - INFO # HDR: -50 105 1
2017-12-19 02:38:26,608 - INFO # HDR: -58 96 1
2017-12-19 02:38:26,640 - INFO # HDR: -39 105 1
2017-12-19 02:38:26,832 - INFO # HDR: -96 29 0
2017-12-19 02:38:26,959 - INFO # Contents(100): EC 1C 01 23 22 FF FF 00 00 DE 09 34 36 29 FF 00 00 D6 09 80 00 00 00 00 00 00 00 CC 09 06 1C 08 00 FE FF 00 00 C2 09 D2 1B 15 00 00 00 BA 09 50 1C 18 00 80 FF BD FF 21 00 E3 6A FF FF 50 1C 0A 00 00 FF FF 6A E3 00 34 1C 0A 00 34 1C 0E 00 FF FF 50 1C 0E 00 80 32 FC FF 77 01 E3 6A FF FF 50 1C 0E 00 80 | ...#"......46).................................P.......!..j..P......j..4...4.....P....2..w..j..P....

Afterwards, shell is dead and sending/receiving stopped completely.

miri64 commented 6 years ago

Have you had a look if gnrc_netreg returns the correct results (in particular gnrc_netreg_num()) for your custom dumper? The dump you gave in OP suggests that a gnrc_pktbuf_release() to many is happening.

roberthartung commented 6 years ago

@miri64 Can you give me a quick example of how to use gnrc_netreg_num()? It's 0 for me: printf("_dump: %d %d\n", gnrc_netreg_num(GNRC_NETTYPE_UNDEF, 0), gnrc_netreg_num(GNRC_NETTYPE_NETIF, 0));.

roberthartung commented 6 years ago

Another node stopped working now:

2017-12-19 09:50:29,846 - INFO # HDR: -91 255 1
2017-12-19 09:50:29,847 - INFO # send... OK
2017-12-19 09:50:30,853 - INFO # send... OK
2017-12-19 09:50:31,861 - INFO # send... OK
2017-12-19 09:50:32,852 - INFO # send... OK
2017-12-19 09:50:33,859 - INFO # send... OK
2017-12-19 09:50:34,850 - INFO # send... OK
2017-12-19 09:50:35,858 - INFO # send... OK
2017-12-19 09:50:36,849 - INFO # send... OK
2017-12-19 09:50:37,856 - INFO # send... OK
2017-12-19 09:50:38,848 - INFO # send... OK
2017-12-19 09:50:39,854 - INFO # send... OK
2017-12-19 09:50:40,846 - INFO # send... OK
2017-12-19 09:50:41,854 - INFO # send... OK
2017-12-19 09:50:42,845 - INFO # send... OK
2017-12-19 09:50:43,852 - INFO # send... OK
2017-12-19 09:50:44,859 - INFO # send... OK

Node is not receiving, but sending. shell still works :/

roberthartung commented 6 years ago

New error now:

2017-12-19 10:03:35,485 - INFO # Contents(13): 20 02 03 00 D0 06 69 70 6E 3A 33 33 04 |  .....ipn:33.
2017-12-19 10:03:35,965 - INFO # HDR: -91 255 1
2017-12-19 10:03:36,093 - INFO # HDR: -91 255 1
2017-12-19 10:03:36,110 - INFO # send... core/thread.c:110 => n/a
2017-12-19 10:03:36,125 - INFO # *** RIOT kernel panic:
2017-12-19 10:03:36,141 - INFO # FAILED ASSERTION.
2017-12-19 10:03:36,142 - INFO # 
2017-12-19 10:03:36,189 - INFO #    pid | name                 | state    Q | pri | stack  ( used) | base addr  | current     
2017-12-19 10:03:36,237 - INFO #      1 | idle                 | pending  Q |  15 |    128 (   88) |     0x1ab9 |     0x1af6 
2017-12-19 10:03:36,272 - INFO #      2 | main                 | bl mutex _ |   7 |    512 (  246) |     0x1b39 |     0x1c49 
2017-12-19 10:03:36,317 - INFO #      3 | at86rf2xx            | running  Q |   2 |    256 (  256) |     0x3665 |     0x36c6 
2017-12-19 10:03:36,365 - INFO #      4 | dump_thread          | bl rx    _ |   8 |    512 (  510) |     0x184b |     0x19eb 
2017-12-19 10:03:36,413 - INFO #      5 | send_thread          | pending  Q |   9 |    512 (  190) |     0x160b |     0x177e 
2017-12-19 10:03:36,445 - INFO #        | SUM                  |            |     |   1920 ( 1290)
2017-12-19 10:03:36,446 - INFO # 
2017-12-19 10:03:36,461 - INFO # *** halted.
2017-12-19 10:03:36,462 - INFO # 
miri64 commented 6 years ago

This looks more and more like a core related issue (maybe just core on inga?).

roberthartung commented 6 years ago

Maybe. No idea what, though. Thread scheduling is taken from other atmega CPUs, as well as periph drivers. Maybe increasing stack size would help?

2017-12-19 11:37:01,888 - INFO # ps
2017-12-19 11:37:01,937 - INFO #    pid | name                 | state    Q | pri | stack  ( used) | base addr  | current     
2017-12-19 11:37:01,984 - INFO #      1 | idle                 | pending  Q |  15 |    128 (   88) |     0x1ab9 |     0x1af4 
2017-12-19 11:37:02,017 - INFO #      2 | main                 | running  Q |   7 |    512 (  356) |     0x1b39 |     0x1c49 
2017-12-19 11:37:02,065 - INFO #      3 | at86rf2xx            | bl rx    _ |   2 |    256 (  256) |     0x3665 |     0x36c6 
2017-12-19 11:37:02,113 - INFO #      4 | dump_thread          | bl rx    _ |   8 |    512 (  510) |     0x184b |     0x19eb 
2017-12-19 11:37:02,160 - INFO #      5 | send_thread          | bl mutex _ |   9 |    512 (  190) |     0x160b |     0x176f 
roberthartung commented 6 years ago

@miri64 INGA problem would only show up on INGA. But telosb nodes sometimes die as well.

miri64 commented 6 years ago

Maybe increasing stack size would help?

Have you tried that? Thread 3 and 4 definitely look like they need more (4 is close to the max).

roberthartung commented 6 years ago

Yes tried that for now. The dump_thread is even at its limit, if I double the stack size. Any idea why?

The receive error also occurs on a TelosB mote.

2017-12-19 15:05:28,529 - INFO # HDR: -51 106 1
2017-12-19 15:05:28,546 - INFO # HDR: -51 106 1
2017-12-19 15:05:28,752 - INFO # send... OK
2017-12-19 15:05:32,116 - INFO # send... OK
2017-12-19 15:05:35,496 - INFO # send... OK
2017-12-19 15:05:38,860 - INFO # send... OK
2017-12-19 15:05:42,240 - INFO # send... OK
2017-12-19 15:05:45,604 - INFO # send... OK
2017-12-19 15:05:48,983 - INFO # send... OK
2017-12-19 15:05:52,363 - INFO # send... OK
2017-12-19 15:05:55,727 - INFO # send... OK
2017-12-19 15:05:59,107 - INFO # send... OK
2017-12-19 15:06:02,487 - INFO # send... OK
2017-12-19 15:06:05,867 - INFO # send... OK
2017-12-19 15:06:09,248 - INFO # send... OK
miri64 commented 6 years ago

Yes tried that for now. The dump_thread is even at its limit, if I double the stack size. Any idea why?

printf() uses a lot of stack-space, at least with newlib. Have you considered using sys/include/fmt.h instead?

roberthartung commented 6 years ago

INGA / Samr21-xpro works fine right now. However, my telosb nodes are now dying all the time. Exactly when I try to send a packet and at the same time I receive a packet.

roberthartung commented 6 years ago

Playing around with stack sizes solved the problem for now. Additionally some cleanup in the code. Thanks for now!

roberthartung commented 6 years ago

However, telosb nodes are still stuck sometimes in RX state. "ifconfig set state idle" solves the problem...

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. If you want me to ignore this issue, please mark it with the "State: don't stale" label. Thank you for your contributions.