RIOT-OS / RIOT

RIOT - The friendly OS for IoT
https://riot-os.org
GNU Lesser General Public License v2.1
4.93k stars 1.99k forks source link

pkg/lwip: assert on samr21-xpro #9573

Closed PeterKietzmann closed 6 years ago

PeterKietzmann commented 6 years ago

Description

The following was discovered in #9553. Running tests/lwip on a samr21-xpro asserts after some time (sometimes?!?) - without doing anything:

main(): This is RIOT! (Version: 2018.07-devel-867-ge7deb-515b9e49740c-HEAD)
RIOT lwip test application
Assertion "increment_magnitude <= p->len" failed at /data/riotbuild/riotproject/tests/lwip/bin/pkg/samr21-xpro/lwip/src/core/pbuf.c:583
#! exit 1: powering off

@bergzand was not able to reproduce this behavior on samr-21xpro node in the FIT IoT-LAB testbed/Saclay. @MichelRottleuthner was able to reproduce it (locally).

Steps to reproduce the issue

Expected results

No output on the terminal

Actual results

Assertion from pbuf in the lwip stack

Versions

Ubuntu 18.04 Docker version 17.12.1

miri64 commented 6 years ago

Also unable to reproduce on Ubuntu 17.10 (test ran for at least 5 min) :-/

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 17.10
Release:    17.10
Codename:   artful
$ dist/tools/ci/print_toolchain_versions.sh 
Installed compiler toolchains 
-----------------------------
             native gcc: gcc (Ubuntu 7.2.0-8ubuntu3.2) 7.2.0
      arm-none-eabi-gcc: arm-none-eabi-gcc (15:5.4.1+svn241155-1) 5.4.1 20160919
                avr-gcc: avr-gcc (GCC) 5.4.0
       mips-mti-elf-gcc: missing
             msp430-gcc: msp430-gcc (GCC) 4.6.3 20120301 (mspgcc LTS 20120406 unpatched)
   riscv-none-embed-gcc: missing
                  clang: clang version 4.0.1-6 (tags/RELEASE_401/final)

Installed compiler libs
-----------------------
   arm-none-eabi-newlib: "2.4.0"
    mips-mti-elf-newlib: missing
riscv-none-embed-newlib: missing
               avr-libc: "2.0.0" ("20150208")

Installed development tools
---------------------------
                  cmake: cmake version 3.9.1
               cppcheck: missing
                doxygen: 1.8.13
                 flake8: 3.2.1 (pyflakes: 1.5.0, pycodestyle: 2.3.1, mccabe: 0.6.1) CPython 3.6.3 on Linux
                    git: git version 2.14.1
             coccinelle: missing

Will try again with docker.

miri64 commented 6 years ago

Couldn't reproduce with Docker either.... I ran the following to try to reproduce (with my fix in #9578):

BOARD=samr21-xpro IOTLAB_EXP_ID=125850 IOTLAB_NODE=samr21-9.saclay.iot-lab.info BUILD_IN_DOCKER=1 make -C tests/lwip flash term
MrKevinWeiss commented 6 years ago

I received the following error on my computer (not build with docker)

2018-07-17 08:45:12,909 - INFO #  Assertion "increment_magnitude <= p->len" failed at /home/kevinweiss/WorkingDirectory/RIOT/tests/lwip/bin/pkg/samr21-xpro/lwip/src/core/pbuf.c:583
2018-07-17 08:45:12,911 - INFO # #! exit 1: powering off

Here is some fun info about my setup

No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.4 LTS
Release:    16.04
Codename:   xenial
-----------------------------
             native gcc: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
      arm-none-eabi-gcc: arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 7-2017-q4-major) 7.2.1 20170904 (release) [ARM/embedded-7-branch revision 255204]
                avr-gcc: avr-gcc (GCC) 4.9.2
       mips-mti-elf-gcc: missing
             msp430-gcc: missing
   riscv-none-embed-gcc: missing
                  clang: clang version 5.0.0-3~16.04.1 (tags/RELEASE_500/final)

Installed compiler libs
-----------------------
   arm-none-eabi-newlib: "2.5.0"
    mips-mti-elf-newlib: missing
riscv-none-embed-newlib: missing
               avr-libc: "1.8.0svn" ("20111229")

Installed development tools
---------------------------
                  cmake: cmake version 3.5.1
               cppcheck: missing
                doxygen: missing
                 flake8: missing
                    git: git version 2.7.4
             coccinelle: missing
miri64 commented 6 years ago

Must be the Hamburg air :D

miri64 commented 6 years ago

I'll try it with my 16.04 machine as well today.

miri64 commented 6 years ago

Works like a charm there as well. :-/

$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 16.04.4 LTS
Release:    16.04
Codename:   xenial
$ ./dist/tools/ci/print_toolchain_versions.sh 
Installed compiler toolchains 
-----------------------------
             native gcc: gcc (Ubuntu 5.4.0-6ubuntu1~16.04.10) 5.4.0 20160609
      arm-none-eabi-gcc: arm-none-eabi-gcc (GNU Tools for Arm Embedded Processors 7-2018-q3-update) 7.3.1 20180622 (release) [ARM/embedded-7-branch revision 261907]
                avr-gcc: avr-gcc (GCC) 4.9.2
       mips-mti-elf-gcc: mips-mti-elf-gcc (Codescape GNU Tools 2016.05-03 for MIPS MTI Bare Metal) 4.9.2
             msp430-gcc: msp430-gcc (GCC) 4.6.3 20120301 (mspgcc LTS 20120406 unpatched)
   riscv-none-embed-gcc: missing
                  clang: clang version 3.8.0-2ubuntu4 (tags/RELEASE_380/final)

Installed compiler libs
-----------------------
   arm-none-eabi-newlib: "3.0.0"
    mips-mti-elf-newlib: "2.1.0"
riscv-none-embed-newlib: missing
               avr-libc: "1.8.0svn" ("20111229")

Installed development tools
---------------------------
                  cmake: cmake version 3.5.1
               cppcheck: Cppcheck 1.72
                doxygen: 1.8.11
                 flake8: 2.5.4 (pep8: 1.7.0, pyflakes: 1.1.0, mccabe: 0.2.1) CPython 3.5.2 on Linux
                    git: git version 2.7.4
             coccinelle: spatch version 1.0.4 with Python support and with PCRE support
PeterKietzmann commented 6 years ago

Must be the Hamburg air :D

Actually I think it is. I switched to channel 11 directly after reset and since then (>30 min.) it runs without assertion. I'll let @MrKevinWeiss double-check this behavior. However, that could also indicate bad radio configurations

jcarrano commented 6 years ago

Left it running for almost an hour. No crashes.

My system (Archlinux):

Installed compiler toolchains 
-----------------------------
             native gcc: gcc (GCC) 8.1.1 20180531
      arm-none-eabi-gcc: arm-none-eabi-gcc (Arch Repository) 8.1.0
                avr-gcc: missing
       mips-mti-elf-gcc: missing
             msp430-gcc: missing
   riscv-none-embed-gcc: missing
                  clang: clang version 6.0.1 (tags/RELEASE_601/final)

Installed compiler libs
-----------------------
   arm-none-eabi-newlib: "3.0.0"
    mips-mti-elf-newlib: missing
riscv-none-embed-newlib: missing
               avr-libc: missing (missing)

Installed development tools
---------------------------
                  cmake: cmake version 3.11.4
               cppcheck: Cppcheck 1.84
                doxygen: 1.8.14
                 flake8: 3.5.0 (mccabe: 0.6.1, pycodestyle: 2.4.0, pyflakes: 2.0.0) CPython 3.6.6 on Linux
                    git: git version 2.18.0
             coccinelle: missing
MrKevinWeiss commented 6 years ago

I tested again with channel 11 instead of the default 26 and it appears that it doesn't fail... Maybe it is the Hamburg air but only the air on channel 26.

miri64 commented 6 years ago

Can one of you maybe sniff on channel 26 while the crash is happening to see on what lwIP is killing itself?

PeterKietzmann commented 6 years ago

@miri64 I sniffed the traffic back then -> nothing to see in wireshark. I retried now -> nothing to see. This doesn't necessarily mean that (i) there is no other signals on 2,4 GHz which interfere somehow but more dramatically (ii) there is no other 802.15.4 packets see this comment.

When we've been working on #8570 we accidentally enabled the promiscuous mode but treated the rest of the radio/netdev as if it was in the usual mode which made the stack crash rather quick, even though basic IP communication initially worked. That's why I stated: " could also indicate bad radio configurations".

Hard to say, maybe the spectrum-scanner gives further insights. Or maybe we're still on the wrong track. Right now I don't find time for this kind of analysis.

miri64 commented 6 years ago

I'll try to remember to bring my econotag when I'm in Hamburg, so we can check with it as well.

jia200x commented 6 years ago

@PeterKietzmann can you confirm if this issue still exists after bumping the pkg version of lwip?

PeterKietzmann commented 6 years ago

No failed assertions after ~45 minutes with current master. I checkout out Release-2018.07 and the assertion failed "instantly". I repeated both experiments a couple of times, always with the same effect. Thus, I consider the issue fixed. Most likely with the lwip pkg version bump.