Closed aabadie closed 5 years ago
As a bonus I rebased https://github.com/RIOT-OS/RIOT/pull/10908 to 2019.01-RC2
and ran the test. It passed ;-).
8.3 still works on 2019.01-RC2
as well.
8.4 still works. BTW also without the compile flag (which I accidentally forgot to set), since the Raspberry Pi I'm using has the ABRO configured in its radvd.conf ;-).
10 still works, however still found 2 bugs in the testing procedures (see #100 and #101)
I will run the automated tests on boards, at least iotlab-m3
, samr21-xpro
and see what other boards I can run on that I have here for the next step.
For both iotlab-m3
and samr21-xpro
all automated tests ran with these failed tests:
Failures during test:
- tests/gnrc_ipv6_ext
- tests/gnrc_rpl_srh
- tests/pkg_fatfs_vfs This one need an sd_card so cannot really run
The gnrc_ipv6_ext
and gnrc_rpl_srh
requires root when running the test, I could run them manually:
Note, arm-gcc is not in my normal path so not seen when run with sudo, so we currently have printed errors.
However, it is using my locally installed python packages so no need for a special setup there.
BOARD=samr21-xpro make -C tests/gnrc_ipv6_ext flash
sudo BOARD=samr21-xpro make --no-print-directory -C tests/gnrc_ipv6_ext/ test
/bin/sh: 1: arm-none-eabi-gcc: not found
/home/harter/work/git/worktree/riot_release/makefiles/toolchain/gnu.inc.mk:18: objcopy not found. Hex file will not be created.
...................SUCCESS
And for gnrc_rpl_srh
BOARD=samr21-xpro make -C tests/gnrc_rpl_srh flash
sudo BOARD=samr21-xpro make --no-print-directory -C tests/gnrc_rpl_srh test
/bin/sh: 1: arm-none-eabi-gcc: not found
/home/harter/work/git/worktree/riot_release/makefiles/toolchain/gnu.inc.mk:18: objcopy not found. Hex file will not be created.
..............SUCCESS
I think these tests should be somehow defined as ADMIN_TESTS
or something in RIOT.
This would allow special handling for these ones like having an admin-test
target or something.
Note, arm-gcc is not in my normal path so not seen when run with sudo, so we currently have printed errors.
You don't need to build and you don't need to flash with sudo
(unless you did not configure your udev rules of course). Just the execution of the test script requires root.
I think these tests should be somehow defined as
ADMIN_TESTS
or something in RIOT. This would allow special handling for these ones like having anadmin-test
target or something.
I think the name admin-test
might be misleading. There is a difference between the root user and an admin user (though they might be the same person in some cases) ;-)
Note, arm-gcc is not in my normal path so not seen when run with sudo, so we currently have printed errors.
You don't need to build and you don't need to flash with
sudo
(unless you did not configure your udev rules of course). Just the execution of the test script requires root.
Yes, but it still tries to evaluate arm-none-eabi-gcc
even when running tests, it is an unrelated issue, just noted the error message.
I think these tests should be somehow defined as
ADMIN_TESTS
or something in RIOT. This would allow special handling for these ones like having anadmin-test
target or something.I think the name
admin-test
might be misleading. There is a difference between the root user and an admin user (though they might be the same person in some cases) ;-)
It was more on the concept than the name, I was not confident with root-test
either, maybe a privileged-tests
or something. But it would be a dedicated discussion in an issue/PR.
Compiling and running tests for iotlab-m3
on a machine with no toolchain using docker, also has tests/riotboot
failing to compile:
make RIOT_CI_BUILD=1 CC_NOCOLOR=1 --no-print-directory -C ./tests/riotboot clean all --jobs
make: *** No rule to make target '/srv/ilab-builds/workspace/git/riot_release/tests/riotboot/bin/iotlab-m3/tests_riotboot-slot0.bin', needed by '/srv/ilab-builds/workspace/git/riot_release/tests/riotboot/bin/iotlab-m3/tests_riotboot-slot0.hdr'. Stop.
make: *** Waiting for unfinished jobs....
compiling /srv/ilab-builds/workspace/git/riot_release/dist/tools/riotboot_gen_hdr/bin/genhdr...
Return value: 2
@cgundogan, @jia200x, @leandrolanzieri, you checked some items for the RC1. Can you try again on RC2? That would help a lot, thanks!
I just re-run the grnc_ipv6_ext
example with samr21-xpro
after rebooting and the test fails:
BOARD=samr21-xpro make -C tests/gnrc_ipv6_ext/ flash
...
sudo BOARD=samr21-xpro PATH=${PATH} make -C tests/gnrc_ipv6_ext/ test
make: Entering directory '/home/harter/work/git/worktree/riot_release/tests/gnrc_ipv6_ext'
Traceback (most recent call last):
File "/home/harter/work/git/worktree/riot_release/tests/gnrc_ipv6_ext/tests/01-run.py", line 646, in <module>
sys.exit(run(testfunc, timeout=1, echo=False))
File "/home/harter/work/git/worktree/riot_release/dist/pythonlibs/testrunner/__init__.py", line 56, in run
testfunc(child)
File "/home/harter/work/git/worktree/riot_release/tests/gnrc_ipv6_ext/tests/01-run.py", line 596, in testfunc
lladdr_src = get_host_lladdr(tap)
File "/home/harter/work/git/worktree/riot_release/tests/gnrc_ipv6_ext/tests/01-run.py", line 587, in get_host_lladdr
"Can't find host link-local address on interface {}".format(tap)
AssertionError: Can't find host link-local address on interface tap0
/home/harter/work/git/worktree/riot_release/tests/gnrc_ipv6_ext/../../Makefile.include:568: recipe for target 'test' failed
make: *** [test] Error 1
make: Leaving directory '/home/harter/work/git/worktree/riot_release/tests/gnrc_ipv6_ext'
As the test is running ethos
alone, without pre-creating the interface, it stays in the down
state.
ip link show dev tap0
27: tap0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 36:21:8b:bf:9a:1f brd ff:ff:ff:ff:ff:ff
Doing as in start_network.sh
, with creating the interface and putting it up before the test it works.
When dist/tools/tapsetup/tapsetup
has been run before it also works, this should have been my state on the previous run.
Note on tests with arduino-mega2560
, there are several tests failure, even some that look important:
Failures during test:
- [tests/bitarithm_timings](tests/bitarithm_timings/test.failed)
- [tests/evtimer_msg](tests/evtimer_msg/test.failed)
- [tests/isr_yield_higher](tests/isr_yield_higher/test.failed)
- [tests/libfixmath](tests/libfixmath/test.failed)
- [tests/periph_eeprom](tests/periph_eeprom/test.failed)
- [tests/periph_gpio](tests/periph_gpio/test.failed)
- [tests/pipe](tests/pipe/test.failed)
- [tests/pkg_fatfs_vfs](tests/pkg_fatfs_vfs/test.failed)
- [tests/pkg_jsmn](tests/pkg_jsmn/test.failed)
- [tests/pkg_libb2](tests/pkg_libb2/test.failed)
- [tests/pkg_lora-serialization](tests/pkg_lora-serialization/test.failed)
- [tests/pkg_micro-ecc](tests/pkg_micro-ecc/test.failed)
- [tests/pkg_tiny-asn1](tests/pkg_tiny-asn1/test.failed)
- [tests/posix_semaphore](tests/posix_semaphore/test.failed)
- [tests/ps_schedstatistics](tests/ps_schedstatistics/test.failed)
- [tests/rng](tests/rng/test.failed)
- [tests/trickle](tests/trickle/test.failed)
I will look into them and give details.
@miri64, I manually ran the 3.4 task and the packet buffer of the receiving node is not empty, even several seconds after the end of the test. I put the content in this gist I forgot to check the pktbuf of the sending nodes but they were still reachable.
For my setup, I created 11 tap interfaces using tapsetup and attached 11 native instances to each tap. Then 10 of the nodes started to send pings (I set count to 1000) to a single one (on tap0).
For my setup, I created 11 tap interfaces using tapsetup and attached 11 native instances to each tap. Then 10 of the nodes started to send pings (I set count to 1000) to a single one (on tap0).
Why not use the ping6
command of your host machine?
I can't really reproduce your results :-/. I tried
sudo true; for _ in $(seq 10); do sudo ping -c 1000 -s 1452 -f fe80::9494:71ff:fe6c:d5a%tapbr0 & done
sudo true; for _ in $(seq 10); do sudo ping -c 1000 -s 1452 -i .01 fe80::9494:71ff:fe6c:d5a%tapbr0 & done
sudo true; for _ in $(seq 10); do sudo ping -c 1000 -s 1452 -i .001 fe80::9494:71ff:fe6c:d5a%tapbr0 & done
None caused any leaks.
I'll see what happens, when I run a long-term experiment tonight.
A few testing:
-i 0
) from my host and the pktbuf on the native node is never filled even during the flood. The RIOT node is always active, no error message is displayedgnrc_netif: possibly lost interrupt.
on the native node. It remains active, even during the flood, the pktbuf is always emptyMy guess is that in these 2 cases, the src address of the ping is always the same (the one of Linux host), so maybe a lot of them could be dropped during the flood ? When I tried from 10 RIOT native nodes, the src addresses were all different because attached to a different interface.
- I ran your commands (only with
-i 0
) from my host and the pktbuf on the native node is never filled even during the flood. The RIOT node is always active, no error message is displayed
-f
implies -i 0
;-)
I ran it during the night. I also had possibly lost interrupts (which just means that the gnrc_netif
message queue was full), but the packet buffer is empty.
I will try again from different addresses though.
@aabadie can you share your test script? I was also not able to reproduce the issue you described using https://gist.github.com/miri64/fac4df86be36f0a65d9bdb4d2f09d5c7 (the check of the packet buffer fails to match for some reason, but it is empty).
can you share your test script?
not possible, I did the setup (tapsetup -c 11), started the native instances and launched the pings manually on different terminals. I can try your script.
But how is this a Stresstest? With count 1000 the ping is faster done than you can copy the ping command to all terminals.
With count 1000 the ping is faster done than you can copy the ping command to all terminals.
Sure, but prepare each terminal with the ping commands and then switch between them and launch them. Using keyboard shortcuts, this can be done faster enough to trigger a lot of ping timeout, slowing down everything. I have no idea what is going on during this test and how it is supposed to behave. Maybe the python script is introducing side effects because of the GIL ? Are we sure the pings are performed in parallel ?
With count 1000 the ping is faster done than you can copy the ping command to all terminals.
Sure, but prepare each terminal with the ping commands and then switch between them and launch them. Using keyboard shortcuts, this can be done faster enough to trigger a lot of ping timeout, slowing down everything. I have no idea what is going on during this test and how it is supposed to behave. Maybe the python script is introducing side effects because of the GIL ? Are we sure the pings are performed in parallel ?
No. But since I was already quite annoyed by doing this with two terminals when I analyzed https://github.com/RIOT-OS/RIOT/issues/10672, I'm going to write some script that does the same thing in bash and come back to you.
Ok, I was able to reproduce with this script https://gist.github.com/miri64/fac4df86be36f0a65d9bdb4d2f09d5c7#file-03-4-test-sh. Since it is working for one neighbor but having problems with 10, I suspect something to go wrong in the neighbor discovery.
I even was able to produce a segmentation fault now :o
Though I'd still like to find out, how exactly it happens I opened https://github.com/RIOT-OS/RIOT/pull/10975 to fix the segfault for now. I wasn't able to reproduce the leak with that fix as well with count 1000 and 10000 though I don't understand why either. I investigate further.
https://github.com/RIOT-OS/RIOT/pull/10975 makes the occurrence of the leak harder to reproduce, but I already saw one again. My suspicion is, that (from https://github.com/RIOT-OS/RIOT/pull/10975#issuecomment-461917017)
When just the last element is removed the situation "fixes" itself, since the entry is still referred to by the first position, so re-adding it just leads to a loop of one (breaking the list, but not the system ;-)).
Lead to a number of leaks, that now don't occur anymore.
In your gist, I'm still very confused about the start
00000000 02 01 42 B9 98 78 C8 00 02 01 42 B9 98 78 C8 00
00000010 02 01 42 B9 98 78 C8 00 02 01 42 B9 98 78 C8 00
00000020 02 01 42 B9 98 78 C8 00 00 00 00 00 50 AD 64 56
apart from the last 8 byte (a start of a packet snip) I'm not really sure what the repeating sequence is... :-/
@aabadie With https://github.com/RIOT-OS/RIOT/pull/10978 I can't produce any leaks at the moment.
I had one occurrence while debugging this where I had a gnrc_netif_hdr
stuck in the packet buffer (all packets that are released with the fix in https://github.com/RIOT-OS/RIOT/pull/10978 should not have a netif header, since it is removed here), so I'm not 100% confident if it removes all leaks for case you described.
I will re-do the multihop tests once I have a stable Internet connectivity. Probably around noon.
In your gist, I'm still very confused about the start
00000000 02 01 42 B9 98 78 C8 00 02 01 42 B9 98 78 C8 00 00000010 02 01 42 B9 98 78 C8 00 02 01 42 B9 98 78 C8 00 00000020 02 01 42 B9 98 78 C8 00 00 00 00 00 50 AD 64 56
apart from the last 8 byte (a start of a packet snip) I'm not really sure what the repeating sequence is... :-/
Ah, those are target link-layer address options for the address 42:b9:98:78:c8:00
.
I added the details of arduino-mega2560
failures in the main post. I will see what I can fix.
It looks like the issue for overflow in tests/periph_gpio
comes from the fact that benchmark
takes timing from within masked interrupts and xtimer_now_usec
does not look like implemented to work from within masked interrupts for arduino-mega2560
.
Closing in favor of #105
Closing now that there's #105
This issue lists the status of all tests for the Release Candidate 2 of the 2019.01 release.
Specs tested:
tests/riotboot
bitarithm_msb: 102096 iterations per second
4294930872us
and a timeout in the test for getting one line02-bench.py:22
forbench 0 4
it works with a30
seconds timeout forexpect
at least.blake2_tests.test_blake2s (/data/riotbuild/riotproject/tests/pkg_libb2/main.c 56) memcmp(b2s, hash, sizeof hash) == 0
micro-ecc
does not have 8bit/16bit support.ERROR: Could not allocate the memory for the ASN.1 objects.
stdin
changing threads priority to a lower one (THREAD_PRIORITY_MAIN + 1
) makes the test pass. (BTW test is wrongly written as|
has a meaning forpexpect
).[TRICKLE RESET]
(known issue I think)At 770 ms received msg 0: "supposed to be 659"
TEST FAILED
but I cannot reproduce anymore2.H532
but not sure if it is the issue. Not checked yet.