Closed MrKevinWeiss closed 4 years ago
As tested just before the RC1 @cladmi found the following issues on the board
For boards failing with tests/ps_schedstatistics
fix can come from:
XTIMER_HZ!=1Mhz
For:
- [tests/gnrc_ipv6_ext](tests/gnrc_ipv6_ext/test.failed)
- [tests/gnrc_rpl_srh](tests/gnrc_rpl_srh/test.failed)
- [tests/gnrc_sock_dns](tests/gnrc_sock_dns/test.failed)
I think they fail because they need to be run with sudo (because of ethos). @kaspar030 has done some work with the CI to enable running ethos without sudo (https://github.com/RIOT-OS/RIOT/pull/11816 is one of the PR's), we needed and have been using this to tests https://github.com/RIOT-OS/RIOT/pull/11818.
Compared to the last RC1 no knew bugs seem to be introduced.
For:
- [tests/gnrc_ipv6_ext](tests/gnrc_ipv6_ext/test.failed) - [tests/gnrc_rpl_srh](tests/gnrc_rpl_srh/test.failed) - [tests/gnrc_sock_dns](tests/gnrc_sock_dns/test.failed)
I think they fail because they need to be run with sudo (because of ethos). @kaspar030 has done some work with the CI to enable running ethos without sudo (RIOT-OS/RIOT#11816 is one of the PR's), we needed and have been using this to tests RIOT-OS/RIOT#11818.
Indeed. The test are the result of running make flash test
without root
or manual setup as currently done in CI
. As TEST_ON_CI_WHITELIST += all
is not set for these tests there is no failure in murdock
.
I do not consider TEST_ON_CI_WHITELIST
as I want to see what does not currently work through make test
alone. So it includes the current state of the tests automation even if tests could succeed when run differently.
It is stupid automated testing :)
The test were run using master
cb57c6ff1 with some other required commits to run on my setup with multiple boards connected and no local toolchain.
But these should not affect testing as they only modify flash/reset
.
I checked the output and I also had other issues on my test machine:
python3-scapy
from debian-stretch (0.20)
and ubuntu-bionic (0.23)
do not have scapy.all.raw
so needed to install pip
version 2.40
. It is the same as provided in debian-buster
or ubuntu-cosmic
.Despite being installed, tcpdump
and bridge
were not in regular users PATH. So I added symlinks. It may come up later when tests must be run without sudo
.
The test result should not change but the output should.
I will run the task 01-ci -- Task #01 - Compile test
with BUILD_IN_DOCKER=1
and also with TOOLCHAIN=llvm
.
sudo docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
riot/riotbuild latest bc9d9f175587 7 months ago 3.58GB
sudo docker pull riot/riotbuild:latest
latest: Pulling from riot/riotbuild
Digest: sha256:5218a0692039934276c98c44b70bbb1cc8bc31a7f171670e625cecd2e3f0fc24
Status: Image is up to date for riot/riotbuild:latest
I can also run the automated test suite on different boards as I already did.
@kb2ma Would you be interested in testing task 09-coap by chance?
Please run the failing sudo tests regardless of whether there will be execution without root in the CI or not (since that requires some bootstrapping for scapy, rather not for this release), manually. That is the point why they are included in the release specs ;-)
Yes, happy to run 09-coap.
@leandrolanzieri, I don't want to interrupt if you plan to continue with 09-coap. Let me know.
@kb2ma sorry, yes, almost done with them
Everything looks good for 09-coap and 06-single-hop-udp
07-Multihop had 0% packet loss and no packet buff problems!
# task 1
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 36.106/41.166/46.321 ms
# task 3
100 packets transmitted, 100 packets received, 0% packet loss
round-trip min/avg/max = 0.364/0.364/0.365 ms
05
# task 1
round-trip min/avg/max = 0.309/0.829/2.558 ms
# task 3
round-trip min/avg/max = 0.325/0.830/2.391 ms
For the TOOLCHAIN=llvm
with BUILD_IN_DOCKER=1
I got a lot of failures due to the fact that llvm
is generating bigger firmwares. Like
RIOT_CI_BUILD=1 BOARD=blackpill TOOLCHAIN=llvm BUILD_IN_DOCKER=1 make -C examples/asymcute_mqttsn/
I will limit to the boards we run in CI for the next run.
Hmm it seems that I am have some problems on Task #02 - ICMPv6 echo unicast addresess on iotlab-m3 (default route)
Can someone confirm?
I did a run of scan-build-analyze
for the boards tested with llvm
using
BUILD_IN_DOCKER=1 TOOLCHAIN=llvm ./dist/tools/compile_and_test_for_board/compile_and_test_for_board.py --compile-targets scan-build-analyze --no-test . iotlab-m3
I used some sed
hack to split the warnings in RIOT and in packages. Otherwise the ones in packages would be repeated for each board/application.
It currently reported ~130 in RIOT (including deprecation warnings for ubjson
).
I will try to do a dedicated issue for the ones that look like bugs.
https://ci-ilab.imp.fu-berlin.de/job/RIOT%20scan-build-analyze/16/riot_scan_build/
Hmm it seems that I am have some problems on Task #02 - ICMPv6 echo unicast addresess on iotlab-m3 (default route)
Can someone confirm?
It's working for me, 1% Packet Loss
Please run the failing sudo tests regardless of whether there will be execution without root in the CI or not (since that requires some bootstrapping for scapy, rather not for this release), manually. That is the point why they are included in the release specs ;-)
In theory, also all the non automated tests should also be executed.
Not only the ones with a test
target.
From https://github.com/RIOT-OS/RIOT/pull/11821
I noticed that currently, the first issue with CI
is not the sudo
but that scapy
is not installed in workers and on tests RPis.
I noticed that currently, the first issue with
CI
is not thesudo
but thatscapy
is not installed in workers and on tests RPis.
Yup. This is unfortunately not a matter of just installing scapy, as scapy wants to open a raw socket, which only root can do.
[…] as scapy wants to open a raw socket, which only root can do.
Will fix very soon (but not in a backportable state I fear)
Will fix very soon (but not in a backportable state I fear)
There might also be a workaround using ambient capabilities: https://stackoverflow.com/a/47982075/5910429 If only the raw socket capability is missing, we can create a wrapper binary that allows only that.
There is a TUN/TAP-Wrapper hidden inside scapy
's socket abstraction I'd like to experiment with tomorrow. If that works without root
I'll rather go for that than some permission foobar.
(all our scapy raw sockets use either TUN or TAP interfaces so far, so if they have user permissions granted at creation—see ip tuntap help
—this seems to me the more obvious way)
There is a TUN/TAP-Wrapper hidden inside
scapy
's socket abstraction I'd like to experiment with tomorrow. If that works withoutroot
I'll rather go for that than some permission foobar.
Totally agreed! And let me know if I can help. (I just stumbled over the workaround and wanted to share it.)
So I did 05-task4 and with 0% packetloss round-trip min/avg/max = 122.022/140.804/158.292 ms
I am getting some pretty high packet loss for 04-Task #01 - ICMPv6 link-local echo with iotlab-m3 only when running locally, between 10% and 50%. I tried on other channels and it still seems to fail. I tried on IoTlabs and it was fine. I tried switching m3 boards and it still fails I tried increasing the interval from 10 to 50 and it really helped I tried with samr to samr and it still failed
I was using the tests/gnrc_udp to test.
@miri64, @cgundogan, any ideas? @PeterKietzmann said it could be the Hamburg air or maybe it is our M3 boards that have some issue, maybe floating pins or something? I will note the the USB is a little bit sensitive to position.
@PeterKietzmann said it could be the Hamburg air or maybe it is our M3 boards that have some issue, maybe floating pins or something?
If it works on IoT-LAB (have you tried different sites?) or Varduz, it seems to be the Hamburg air. We had problmes in the past where tests over radio conducted at HAW Hamburg failed while they worked at other places. Maybe there is something jamming the spectrum in your building.
Well I am trying to run the stress test on the same nodes in IoTlabs and it seems like around 7 to 8 ms interval makes it unreadable... is this expected? Could it be an issue with the tests/gnrc_udp?
It seems like the ping6 command is async meaning that the -i is time to send not time to wait until the next send after an ack. If this is the case either the test should be adapted or the ping6 command should be adapted to handle that (maybe add a -s flag)
It seems like the ping6 command is async meaning that the -i is time to send not time to wait until the next send after an ack. If this is the case either the test should be adapted or the ping6 command should be adapted to handle that (maybe add a -s flag)
Didn't we do that already last time? If you are referring to your stress test, which test parameters should be changed? They are not part of the release tests.
What would the -s
flag do?
ping
is supposed to be asynchronous. If we wait for the next echo response to come in the delay between packets is not -i
, but -i
+ RTT.
There is a TUN/TAP-Wrapper hidden inside
scapy
's socket abstraction I'd like to experiment with tomorrow. If that works withoutroot
I'll rather go for that than some permission foobar.Totally agreed! And let me know if I can help. (I just stumbled over the workaround and wanted to share it.)
Sadly, this does not work this way :confused:. Opening a TAP with user rights is only allowed for the application end of the TAP interface (the part we usually use with netdev_tap
and ethos
), so if you try to open two application ends (one for scapy
, one for netdev_tap
/ethos
) you will get an EBUSY
for one of the two. TAPs are just supposed to be used as (app, interface)-pair. The interface end is just a normal interface to the OS and thus can only be accessed with raw sockets.
There is a TUN/TAP-Wrapper hidden inside
scapy
's socket abstraction I'd like to experiment with tomorrow. If that works withoutroot
I'll rather go for that than some permission foobar.Totally agreed! And let me know if I can help. (I just stumbled over the workaround and wanted to share it.)
Sadly, this does not work this way confused. Opening a TAP with user rights is only allowed for the application end of the TAP interface (the part we usually use with
netdev_tap
andethos
), so if you try to open two application ends (one forscapy
, one fornetdev_tap
/ethos
) you will get anEBUSY
for one of the two. TAPs are just supposed to be used as (app, interface)-pair. The interface end is just a normal interface to the OS and thus can only be accessed with raw sockets.
It could just use another input to ethos
, a packet based unix socket or something.
And allow tap
socket
or even maybe tap+socket
.
For raw data injections, it is just needed to have buffers go through ethos
encapsulation I guess.
Would you have a need for having both at the same time or is not needed to handle it?
For running tcpdump
without root it may also require having an additional pipe
or something I think. If possible one that ignores when nobody is listening to not block… Not sure what there is.
This could be something I am interested to implement.
For 01-ci - Task #01
I could correctly compile with BUILD_IN_DOCKER=1
and the ./task01.py
script with not TOOLCHAIN
specified (so gnu
). With a re-run due to network failure or something.
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin BUILD_IN_DOCKER=1 ./task01.py /home/harter/work/git/worktree/riot_release
I however noticed that as it is done through buildtest
that is completely executed in docker
it hides issues with some examples that use the host toolchain like tests/mcuboot
https://github.com/RIOT-OS/RIOT/pull/11083.
DOCKER="sudo docker" BOARDS=nrf52dk PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin BUILD_IN_DOCKER=1 make -C tests/mcuboot/ all
Good so riot is good but the tests need some love!
Maybe we use the following as a template to paste results
It seems that packet loss and duplications are a bit high here @miri64 does this seem to be correct?
It seems the test failed When the size is 100 it cannot transmit any packets. @miri64 @cgundogan any reason why? This happens both ways. This also only seems to happen between samr21 and remote, it works fine when between remote and m3... Also samr21 and m3 work OK.
Have you confirmed that it ever worked? The test is marked as experimental. Maybe fragmentation + cc2538 still is problematic? We had similar issues in the beginning with the kw2x radios.
(I never used the remote nor was involved in the tests, so I'm not sure how "normal" those results are)
I think @smlng was doing these tests in the past. Maybe he can give some insights.
No I haven't, I will try on the last release or so. I just wanted to make sure this wasn't something I was doing wrong.
Have you confirmed that it ever worked? The test is marked as experimental.
If it works a good first step would be a git bisect
to determine what caused the regression.
He is on vacation...
He is on vacation...
Bus-factor ;-P
It also really seems like it cannot handle a stress test as soon as I try pinging with 2 nodes it dies.
7% packet loss
High packet loss
This issue lists the status of all tests for the Release Candidate 1 of the 2019.07 release.
Specs tested: